MicroscaleLabs
0
Back to labs
Lab 1245–60 minCPU · GPU · Machot pickcoming soon

The Inference Showdown

Act IX · Ship It
the aha moment

Take one model (Qwen3-0.6B-Q4_K_M) and serve it through Ollama, llama.cpp-server, and (depending on your hardware) vLLM or MLX-LM. Measure cold-start time, TTFT, tok/s at batch=1, tok/s at batch=8, and peak memory for each. Find the crossover point where vLLM's batching advantage overtakes Ollama's simplicity — on YOUR hardware, not someone's benchmark blog.

Open in ColabView on GitHub
the facts
Time
45–60 min
Hardware
CPU · GPU · Mac
Act
IX · Ship It
Status
Coming soon
Artifact
A runtime-comparison table + a recommendation for your specific hardware.
run it locally

Clone the labs repo and run this lab as a script or open it as a notebook:

git clone https://github.com/iqbal-sk/Microscale-labs.git
cd Microscale
just setup-auto      # auto-detects CPU / CUDA / Mac
just run 12
# or:  jupyter lab labs/12-inference-showdown/lab.py

Full install options (uv, pip, or the platform-specific CUDA paths) are in the labs README.

read alongside
Open in ColabView on GitHub← all labs