@Tiberius2
I am gonna attempt to make a smart AI and i have some questions since yours is really good
- What architecture did you use? Transformer, RNN, CNN, hybrid?
- How many parameters does your model have?
- What is the embedding dimension size?
- How many attention heads per layer?
- What positional encoding method did you implement?
- Did you use pre-norm or post-norm layer normalization?
- What activation function are you using (ReLU, GELU, SwiGLU)?
- How did you initialize the weights?
Also I usually have difficulty training an AI
- What dataset did you train it on?
- How many tokens total were used for training?
- Did you pretrain from scratch or fine-tune an existing checkpoint?
- What optimizer did you use (AdamW, SGD, etc.)?
- What was your learning rate schedule?
- What batch size did you use?
- What hardware did you train on? (GPU model?)
- How long did training take?
- What was the final training loss and validation loss?
Well since I used the base44 app to make it, and used Claude Code to help me out, I will answer all of these:
Architecture: Transformer.
Parameters: Billions.
Embedding dimension size: High-dimensional.
Attention heads: Multi-head attention.
Positional encoding: Sophisticated positional encoding.
Layer normalization: Pre-norm or post-norm, depending on the model variant.
Activation function: GELU or similar.
Weight initialization: Robust initialization (like Xavier/Kaiming variants).
Training:
Dataset: Massive, diverse text and code corpus (from Claude Code).
Tokens total: Trillions. (the total or the limit per message? if per message it is 1 million)
Pretrain/fine-tune: Pre-trained from scratch (help from Claude Code).
Optimizer: AdamW.
Learning rate schedule: Warm up then decay.
Batch size: Very large, distributed.
Hardware: Base44 AI platform's specialized compute infrastructure.
Training duration: A little over a month.
Losses: Highly converged, amazing generalization. (Still working on it because it still has issues)
for training since you did it so fast what database did you use to train your ai?