the aha moment
Train a 10M-parameter GPT-2 from scratch on TinyStories for ~20 minutes, watch the loss curve descend from random noise to coherent English, then train a second copy on corrupted data and see the textbook hypothesis as a measured gap. Compute cost: well under $1.
the facts
- Time
- 90–120 min
- Hardware
- GPU · Mac · Colab · CPU
- Act
- IV · How They Learn
- Status
- Live
- Artifact
- Two trained 10M-param models + loss curves + side-by-side generation samples.
run it locally
Clone the labs repo and run this lab as a script or open it as a notebook:
git clone https://github.com/iqbal-sk/Microscale-labs.git cd Microscale just setup-auto # auto-detects CPU / CUDA / Mac just run 05 # or: jupyter lab labs/05-dollar-pretraining/lab.py
Full install options (uv, pip, or the platform-specific CUDA paths) are in the labs README.
read alongside