AI/LLM Security

Why LLMs Make Things Up

Part 15 of the AI/LLM mastery series — the limit no setting can fix. What hallucination really is, why it happens (the model predicts plausible text, not true text), why it cannot flag its own uncertainty, the hard limits (training cutoff, context window, lost-in-the-middle, exact maths), the full failure-mode catalogue, and how to sharply reduce it with grounding, tools, prompting and evaluation.

#ai #Fundamentals #Hallucination #Limits #LLM #Reliability

AI/LLM Mastery · Part 15 of 20 — the limit no setting can fix. Why a model produces fluent, confident text that is simply wrong, why it cannot tell you when it is guessing, and the hard limits — training cutoff, context window, exact maths — you can never tune away.

The limit no setting can fix

Part 14 ended on a warning: no temperature, no clever sampling, no decoding trick can stop a model from producing beautifully fluent, supremely confident text that is simply, factually wrong. That is not a knob you forgot to turn — it is built into what these models are. This part explains why, and it is probably the single most important article in the series for anyone who actually relies on an LLM. Understand this and you will stop being surprised, and stop being fooled.

We will cover what hallucination really is, the deep reason it happens (it goes straight back to Part 1), why the model cannot flag its own uncertainty, and the hard limits — cutoff dates, context windows, exact maths — that no prompt can remove. Then, the good news: how to sharply reduce all of it. As always, every term defined as we go.

What hallucination is

First, what the word actually means.

Hallucination is when a model produces text that is fluent, confident, and plausible-sounding — but factually false or fabricated. The classic example: ask for a citation and it hands you a clean, well-formatted reference — authors, title, journal, year — that simply does not exist. It is not “lying” (there is no intent), and in a sense nothing malfunctioned: the model generated the most plausible-looking continuation, and a believable fake citation is plausible-looking text. The genuinely dangerous part is that a hallucinated answer looks identical to a correct one — same fluency, same confident tone, same formatting. You cannot tell them apart from the text alone, which is exactly why people get burned.

Why: it predicts plausible, not true

Now the deep reason — and it is everything we learned in Part 1, coming home to roost.

A model is a next-token predictor. It is optimised to produce text that is plausible — statistically likely to come next — not text that is true. Usually those two things overlap, because true statements are common in the training data, so the most plausible continuation is also usually the correct one. That is why it is right so often that we forget the distinction. But plausible and true are not the same set. When the model reaches the edge of its knowledge — an obscure fact, your private data, a recent event — the truth is simply absent, yet a plausible-looking answer is still trivial to generate. And there is no separate fact-checker inside, no internal “is this real?” module — only statistical likelihood. So at the edges, it fills the gap with confident fiction. Hallucination is not a malfunction; it is exactly what a plausibility-machine does where it does not know.

It cannot tell you when it is guessing

A second problem stacks on top of the first and makes it far more dangerous.

The model has no calibrated sense of its own certainty. It states a wrong answer in exactly the same confident, fluent tone as a right one — the “brilliant intern who never admits they are unsure” from Part 1. By default, when it does not know, it does not pause to say “I am not sure”; it just generates the most plausible continuation anyway, and the surface text rarely signals any doubt. Worse, alignment (Part 13) trains it to sound helpful and confident, and can even tip it into sycophancy. The takeaway is blunt and important: a confident tone is not a reliable signal that the answer is correct. Treat fluency and confidence as style, never as evidence.

Hard limit one: the training cutoff

Beyond hallucination on things it half-knows, there are hard limits where it knows nothing — starting with time.

A base model only knows what was in its training data, frozen at a cutoff date (Part 9). Ask about anything after that — a recent event, a new product, today’s price — and it has no information at all. But, as we just saw, it will not say so; it produces a confident, plausible guess. That is a hallucination, specifically about recent events. This one cannot be prompted away, because the information genuinely is not in the weights. The only real fix is to give the model fresh information at question-time — retrieval and tools, Parts 17–18.

Hard limit two: the context window

The second hard limit is how much it can pay attention to at once.

A model can only “see” a fixed number of tokens at a time — its context window (anywhere from a few thousand to a few hundred thousand tokens, depending on the model). Anything outside that window simply is not there, including earlier turns of a long conversation once they scroll off — there is no built-in memory between sessions, so closing the chat wipes the slate. And even inside the window there is a subtle trap nicknamed “lost in the middle” (Liu et al., 2023): models reliably use information at the start and end of a long context, but are noticeably worse at using facts buried in the middle. So a long document you paste in is not perfectly “remembered” — placement matters, and beyond the window it is gone entirely. Bigger windows help, but they are expensive: the KV cache (Part 14) grows with length.

The failure modes, catalogued

Pulling it together, here is the catalogue of where LLMs reliably go wrong — worth committing to memory, because each one is predictable.

Fabricated facts and citations (the headline). Outdated knowledge past the cutoff. Exact maths and counting — it predicts plausible tokens rather than computing, which is why even counting the “r”s in “strawberry” (Part 2) is shaky. Flawed multi-step reasoning — it can produce a fluent chain of logic that is subtly or completely wrong. Brittleness — reword the same question and you may get a different answer; it is not a stable database. And sycophancy — a tendency to agree with whatever you assert (an RLHF artifact, Part 13). None of these are mysteries once you know them; they are the fingerprint of a plausibility engine.

You cannot cure it, but you can reduce it

Now the hopeful part. You cannot cure hallucination — it is intrinsic to a plausibility-machine — but you can reduce it dramatically, and that is what the rest of the series is for.

The biggest lever is grounding: hand the model the real documents at question-time so it reads the facts instead of recalling them. That is retrieval, or RAG, and it is Part 17. Next, give it tools: a calculator for exact maths, a search engine for fresh facts, a code runner — outsource what it is bad at (Part 18). And prompting: ask it to show its reasoning, cite sources, and explicitly allow “I don’t know” (Part 16), backed by evaluation to measure error rates (Part 19).

before you trust an LLM answer

[ ] Is it a FACT or a recent event?     -> verify it / ground with RAG
[ ] Does it need exact MATH?            -> use a calculator/tool
[ ] Did it cite a source?               -> check the source actually exists
[ ] High-stakes (legal, medical, code)? -> a human must verify
[ ] Confident tone?                     -> NOT evidence of correctness

The honest picture — and onward to using it well

The honest picture, in one view:

Hallucination is fluent, confident, false output that looks identical to the truth. It happens because the model predicts plausible text, not true text, with no internal fact-checker and no real sense of its own uncertainty. On top of that sit hard limits no prompt can remove — a frozen training cutoff, a finite context window with “lost in the middle,” and weak exact maths. You cannot fully eliminate any of it; you reduce it with grounding, tools, prompting and evaluation, and you treat every output as a confident draft to be verified, never as gospel. That mindset — brilliant, fast, well-read, and never to be trusted blind — is the single most useful thing to carry out of this whole series.

That closes the story of how these models behave. The final stretch — “Applied Mastery,” Parts 16–20 — is about using them well, and squeezing far more reliability out of the imperfect tool we have just described. It begins with the cheapest, highest-leverage skill of all, the one that needs no code and no GPU: prompt engineering. How you ask an LLM dramatically changes what you get back — and Part 16 shows exactly how.

Reactions

Published	Jun 17, 2026
Updated	Jul 16, 2026
Reading time	7 min
Access	public