AI/LLM Mastery · Part 12 of 20 — the base model knows a great deal but will not actually answer you. Fine-tuning changes its behaviour without touching its knowledge: instruction tuning to make it an assistant, and the LoRA trick that lets you do it on a single GPU.
A genius with no manners
After eleven parts we have a base model that is, frankly, a genius with no manners. Pretraining (Parts 9–11) poured an astonishing amount of knowledge into its dials — but its only learned behaviour is to continue text. Ask it “What is the capital of France?” and it might reply with a list of more capital-city questions, because that is the pattern it saw. The answer “Paris” is sitting right there inside it; the model just was never taught the behaviour of answering. This part is how we fix that.
The crucial mindset, and the thing most newcomers get backwards: we are not trying to teach the model new facts here. The knowledge is already in. We only want to change its behaviour — from “continue the text” to “follow the instruction.” That behaviour change is called fine-tuning, and it is what turns a raw base model into the helpful assistant you actually chat with. We will cover what it is, why it is surprisingly cheap, and the clever LoRA trick that put fine-tuning within reach of almost anyone.
Knowledge it has, behaviour it lacks
Hold this distinction front and centre, because the rest of the article hangs on it.
A model has two separate things: knowledge (the facts and patterns baked into its weights during pretraining) and behaviour (how it responds to you). The base model has tremendous knowledge but unhelpful behaviour. Fine-tuning leaves the knowledge essentially intact and retrains the behaviour. So “continue training” is exactly what it is — we resume the Part 4 training loop, but starting from the already-trained dials, on a small, targeted new dataset, just long enough to nudge the behaviour where we want it.
Instruction tuning (SFT)
The main way to teach behaviour is the simplest thing imaginable: show the model examples of behaving well.
This is Supervised Fine-Tuning (SFT), also called instruction tuning. The training data is a set of (instruction → ideal response) pairs, written or curated by humans — an instruction, and the kind of answer you wish the model would give. You then run the same next-token training loop, but now the model learns to produce the ideal response when it sees the instruction.
[user] What is the capital of France? [assistant] The capital of France is Paris. [user] Summarise this in one line: [assistant] # train the model to produce the [assistant] turn given the [user] turn. # do this over thousands of high-quality demonstrations.
Run over thousands of these demonstrations and the change is night-and-day: the model that used to continue your question with more questions now simply answers it. This single step — the first stage of the famous InstructGPT recipe (Ouyang et al., 2022) — is what converts a base model into an instruction-following assistant.
Why fine-tuning is cheap
Here is what surprises people: this whole step is cheap, especially next to pretraining.
Pretraining meant trillions of tokens, months of compute and millions of dollars (Part 9). SFT typically uses thousands to a few million examples and finishes in hours to days. Why the enormous gap? Because SFT is not teaching the model facts — those were poured in during pretraining. It is only teaching the model to access and present what it already knows, in a helpful format. It is surfacing a behaviour the model is already capable of, not building a new capability. That is why a relatively small amount of high-quality instruction data goes such a remarkably long way — and why good demonstrations matter more than sheer volume.
The catch: full fine-tuning is heavy
Cheap in data and time, yes — but there is a catch hidden in how you actually do the updating.
The naive approach, full fine-tuning, updates every one of the model’s billions of parameters. Two problems follow. First, memory: to train a weight you must also store its gradient and optimizer state, which together can come to several times the model’s own size in RAM — so a model that comfortably runs on your hardware may be impossible to fully fine-tune on it. Second, every fine-tuned task produces a whole new multi-gigabyte copy of the entire model; ten tasks means ten giant copies to store and serve. Updating billions of dials just to change the model’s manners is, frankly, wasteful — which is exactly the gap the next idea fills.
LoRA: a tiny adapter instead
The fix is to stop touching the big weights at all. This family of methods is called PEFT (Parameter-Efficient Fine-Tuning), and its most popular member is LoRA (Low-Rank Adaptation, Hu et al., 2021).
original weight matrix: W (huge — FROZEN, never trained)
add a small adapter: B x A (two skinny "low-rank" matrices)
(B x A has W's shape, but far fewer numbers)
effective weight: W + (B x A) (only B and A are trained)
# B and A together are often < 1% of W's size.Take a big weight matrix W and freeze it — it never changes during fine-tuning. Alongside it, add a small adapter made of two skinny low-rank matrices, B and A, whose product B×A has the same shape as W but is built from far fewer numbers. The model now uses W + (B×A), and you train only B and A. Because those adapters are often under 1% of the full parameter count, fine-tuning needs a fraction of the memory and runs on a single modest GPU. As a bonus, the original knowledge in the frozen W is preserved untouched. (QLoRA goes further, compressing W with quantization — Part 20’s territory — to shrink the requirement even more.)
One base model, many adapters
That tiny adapter unlocks a genuinely elegant way of working.
A LoRA adapter is just those two small matrices — often only a few megabytes, versus many gigabytes for the full model. So you can keep a whole library of small task adapters around a single shared base model: a legal adapter, a code adapter, a customer-support adapter, an adapter for your brand’s voice. At runtime you snap a different adapter onto the same base model to switch its behaviour instantly, with everything sharing the one model in memory. Modular, cheap, and fast to swap. This is the reason LoRA and QLoRA democratised fine-tuning — customising an LLM went from a data-centre job to something a hobbyist can do on one GPU overnight.
Two honest limits
Two honest limits, because fine-tuning is often reached for in the wrong situations.
First: fine-tuning is excellent for changing behaviour and style — following instructions, adopting a tone, a format, a domain’s conventions — but it is the wrong tool for adding new facts. If you want a model to know your company’s internal docs or today’s news, fine-tuning will memorise a little, unreliably, and often hallucinate the rest. The right approach is to hand the model the relevant documents at question-time, which is retrieval (RAG), coming in Part 17. Second: catastrophic forgetting — fine-tune too aggressively on a narrow task and the model can lose its broad, general ability. Keep it light. LoRA helps here too, since the frozen W keeps the original knowledge safe. The rule of thumb: fine-tune to change how it responds, retrieve to change what it knows, and go gently either way.
Fine-tuning — and the preference gap
Fine-tuning, in one view:
It is continued training on a small, targeted dataset; its key form, instruction tuning (SFT), turns a text-continuer into an instruction-follower using (instruction → answer) pairs; it is cheap because it surfaces behaviour rather than adding knowledge; LoRA makes it light enough for one GPU by training tiny adapters; and it is the right tool for behaviour and style, not for new facts.
But notice what SFT actually optimises for: it teaches the model to produce “a good answer” — one that looks like the human demonstrations. It does not, by itself, teach the model which of many plausible answers a person would most prefer — the most helpful, the most honest, the least harmful. Capturing that finer, fuzzier sense of human preference needs a different technique entirely: instead of imitating fixed answers, the model learns from humans comparing and ranking its outputs. That is reinforcement learning from human feedback — RLHF — the alignment step that gives an assistant its final polish, and it is Part 13.