Microscale
0
← back to the atlas
Act VI · Region 06

Making It Yours

PEFT, preference optimization, and specialization recipes

LoRA as a decomposition you can watch form. QLoRA as a better bit-grid. DPO as a derivation that builds in front of you. GRPO as a rollout dance. Then three end-to-end recipes: tool-calling SLM, domain expert, personal assistant.

badge · Distillation Engineer
0 of 6 lessons completed
  1. 1
    LoRA, visualized
    Watch BA form and the rank slider do work
  2. 2
    QLoRA and the NF4 grid
    Quantile-based 4-bit quantization
  3. 3
    DPO as KL-constrained optimum
    A Manim-style derivation of the loss
  4. 4
    GRPO and RLVR
    Group-relative advantages, no critic
  5. 5
    Three recipes
    Tool-calling · domain · personal
  6. 6
    Fine-tuning frameworks
    Unsloth · Axolotl · LLaMA-Factory · TRL · torchtune · MLX-LM — and how to pick