MicroscaleLabs
0
Back to labs
Lab 0390–120 minCPU · Mac · Colab

Build a Transformer Block from Raw Ops

Act II · Inside the Machine
the aha moment

Implement RMSNorm, RoPE, Grouped-Query Attention, and SwiGLU from scratch in PyTorch — no `nn.TransformerEncoderLayer`, no HuggingFace. Load the real weights from Qwen3-0.6B's layer 0 into your version, and wait for `torch.allclose(yours, theirs, atol=1e-5)` to return True. The hardest lab. The most satisfying lab.

Open in ColabView on GitHub
the facts
Time
90–120 min
Hardware
CPU · Mac · Colab
Act
II · Inside the Machine
Status
Live
Artifact
A standalone transformer_block.py reference implementation that loads any Qwen3 layer.
run it locally

Clone the labs repo and run this lab as a script or open it as a notebook:

git clone https://github.com/iqbal-sk/Microscale-labs.git
cd Microscale
just setup-auto      # auto-detects CPU / CUDA / Mac
just run 03
# or:  jupyter lab labs/03-build-a-transformer/lab.py

Full install options (uv, pip, or the platform-specific CUDA paths) are in the labs README.

read alongside
Open in ColabView on GitHub← all labs