the aha moment
Load Qwen3-0.6B, register forward hooks on every attention layer, and extract all 448 head-patterns as heatmaps. Find the previous-token head, find the induction head, find the uniform global-attention head. Zero one and measure the perplexity hit. Some heads matter, most don't — and now you can prove it.
the facts
- Time
- 60–90 min
- Hardware
- CPU · Mac · GPU · Colab
- Act
- II · Inside the Machine
- Status
- Live
- Artifact
- A 28×16 gallery of attention-head heatmaps and an ablation-impact grid.
run it locally
Clone the labs repo and run this lab as a script or open it as a notebook:
git clone https://github.com/iqbal-sk/Microscale-labs.git cd Microscale just setup-auto # auto-detects CPU / CUDA / Mac just run 02 # or: jupyter lab labs/02-attention-microscope/lab.py
Full install options (uv, pip, or the platform-specific CUDA paths) are in the labs README.
read alongside