MicroscaleLabs
0
Back to labs
Lab 1060–90 minGPU · Mac · Colab

The Roofline Lab

Act VIII · Serving the Model
the aha moment

Measure your GPU's actual sustained bandwidth (not spec-sheet) with a memcpy microbenchmark, measure sustained compute with a matmul microbench, and plot YOUR hardware's roofline. Overlay your model's arithmetic intensity at batch=1 (decode) and batch=32 (prefill-like). See decode sitting deep in the bandwidth-bound region on your actual GPU.

Open in ColabView on GitHub
the facts
Time
60–90 min
Hardware
GPU · Mac · Colab
Act
VIII · Serving the Model
Status
Live
Artifact
A roofline chart with your GPU's measured ridge and your model's operating points.
run it locally

Clone the labs repo and run this lab as a script or open it as a notebook:

git clone https://github.com/iqbal-sk/Microscale-labs.git
cd Microscale
just setup-auto      # auto-detects CPU / CUDA / Mac
just run 10
# or:  jupyter lab labs/10-roofline-lab/lab.py

Full install options (uv, pip, or the platform-specific CUDA paths) are in the labs README.

read alongside
Open in ColabView on GitHub← all labs