Making It Yours

PEFT, preference optimization, and specialization recipes

LoRA as a decomposition you can watch form. QLoRA as a better bit-grid. DPO as a derivation that builds in front of you. GRPO as a rollout dance. Then three end-to-end recipes: tool-calling SLM, domain expert, personal assistant.

badge · Distillation Engineer

0 of 6 lessons completed

1
LoRA, visualized
Watch BA form and the rank slider do work
11 min
55 xp
2
QLoRA and the NF4 grid
Quantile-based 4-bit quantization
10 min
50 xp
3
DPO as KL-constrained optimum
A Manim-style derivation of the loss
12 min
60 xp
4
GRPO and RLVR
Group-relative advantages, no critic
11 min
55 xp
5
Three recipes
Tool-calling · domain · personal
12 min
60 xp
6
Fine-tuning frameworks
Unsloth · Axolotl · LLaMA-Factory · TRL · torchtune · MLX-LM — and how to pick
14 min
60 xp