Hallucination is not a bug — it's a property
Xu, Jain, and Kankanhalli (2024) offered a formal argument for why hallucination cannot be eliminated from language models. Sketch of the argument:
- A language model is a function from contexts to token distributions.
- For any finite training corpus, there exist true facts about the world not in the corpus.
- An ideally-trained LM predicts tokens with high probability when they match the training distribution.
- For facts outside the training distribution, the LM's output is determined by interpolationover similar patterns — which can produce confident but false statements.
- No training algorithm can avoid this, because step (2) is always true for any finite corpus.
The practical upshot: you can reduce hallucination, you can detect it, you can constrain output to be verifiable — but you cannot train it out of a language model. It's as fundamental as the fact that a finite classifier can't learn every possible function.
There is a calibration-theoretic sharpening of this worth internalizing. Kalai and Vempala 2024 (“Calibrated Language Models Must Hallucinate”) show that if a language model is well-calibrated in the Brier-score sense — meaning when it says “0.9” it is right 90% of the time — then its hallucination rate on facts seen exactly once in the training corpus is bounded belowby the fraction of such singleton facts in the corpus, typically 5–15% for web-scale data. You can destroy calibration to drop the floor (the model becomes systematically under-confident and refuses more), or you can destroy coverage (the model refuses on anything unfamiliar). You cannot have calibrated, confident, and hallucination-free simultaneously — it's an impossibility triangle, not a training bug. This is why retrieval grounding and explicit abstain tokens aren't polish; they're the only escape hatches from a theorem.
Why SLMs hallucinate more
Three compounding reasons:
- Capacity.Fewer parameters mean less factual storage. Ballpark: Allen-Zhu & Li 2024 estimate ~2 bits of facts per parameter. A 3B model can store ~750 MB of facts — split across all topics, languages, code.
- Overtraining. SLMs are trained far past Chinchilla, which improves generalization but reduces memorization of rare facts. Paradoxically, better models know less trivia.
- Distillation concentration. Distilled students are confidenton their training distribution. That confidence transfers to out-of-distribution queries where it's misplaced.
What actually mitigates it
You cannot eliminate hallucination. You can:
- RAG — put the ground truth in context; the model becomes a summarizer instead of a recall engine.
- Constrained decoding— for structured outputs, grammar-constrain generation so the model can't emit invalid fields at all.
- Verifier loops — generate, then check with a separate model or rule (the RLVR pattern from DeepSeek-R1).
- Abstention training— teach the model to output “I don't know” as a valid answer via preference data.