MicroscaleLabs
0
Back to labs
Lab 0130 minCPU · Colab

The Token Tax

Act I · The Landscape
the aha moment

Load four real BPE tokenizers — o200k, cl100k, p50k, gpt2 — feed them the same sentence in five languages, and watch the token-per-word ratio climb from 1× in English to 4-5× in Hindi on GPT-2. The fairness gap stops being theory and becomes a number you measured.

Open in ColabView on GitHub
the facts
Time
30 min
Hardware
CPU · Colab
Act
I · The Landscape
Status
Live
Artifact
A tokenizer × language heatmap and an interactive HTML chart you can share.
run it locally

Clone the labs repo and run this lab as a script or open it as a notebook:

git clone https://github.com/iqbal-sk/Microscale-labs.git
cd Microscale
just setup-auto      # auto-detects CPU / CUDA / Mac
just run 01
# or:  jupyter lab labs/01-token-tax/lab.py

Full install options (uv, pip, or the platform-specific CUDA paths) are in the labs README.

read alongside
Open in ColabView on GitHub← all labs