MicroscaleLabs
0
Back to labs
Lab 1160 minCPU · GPU · Mac · Colab

KV Cache Budget Calculator

Act VIII · Serving the Model
the aha moment

Turn the KV-cache formula from the lesson into a working calculator that takes model config + batch size + sequence length and predicts exact bytes. Serve 1 / 4 / 16 / 64 concurrent requests on a real model and plot predicted vs actual memory. The gap is everything the formula doesn't cover — activations, CUDA context, framework overhead. Now you know the correction factor for your stack.

Open in ColabView on GitHub
the facts
Time
60 min
Hardware
CPU · GPU · Mac · Colab
Act
VIII · Serving the Model
Status
Live
Artifact
A KV-cache calculator script + a predicted-vs-actual memory chart.
run it locally

Clone the labs repo and run this lab as a script or open it as a notebook:

git clone https://github.com/iqbal-sk/Microscale-labs.git
cd Microscale
just setup-auto      # auto-detects CPU / CUDA / Mac
just run 11
# or:  jupyter lab labs/11-kv-cache-calculator/lab.py

Full install options (uv, pip, or the platform-specific CUDA paths) are in the labs README.

read alongside
Open in ColabView on GitHub← all labs