Fine-Tuning

Part 12 of the AI/LLM mastery series — turning a knowledgeable base model into a helpful assistant by changing its behaviour, not its knowledge. Instruction tuning (SFT) with (instruction → response) pairs, why it is far cheaper than pretraining, why full fine-tuning is memory-heavy, the LoRA / PEFT trick (freeze the big weights, train a tiny adapter) and swappable adapters, plus what fine-tuning should NOT be used for.

AI/LLM Mastery · Part 12 of 20 — the base model knows a great deal but will not actually answer you. Fine-tuning changes its behaviour without touching its knowledge: instruction tuning to make it an assistant, and the LoRA trick that lets you do it on a single GPU.

A genius with no manners

After eleven parts we have a base model that is, frankly, a genius with no manners. Pretraining (Parts 9–11) poured an astonishing amount of knowledge into its dials — but its only learned behaviour is to continue text. Ask it “What is the capital of France?” and it might reply with a list of more capital-city questions, because that is the pattern it saw. The answer “Paris” is sitting right there inside it; the model just was never taught the behaviour of answering. This part is how we fix that.

The crucial mindset, and the thing most newcomers get backwards: we are not trying to teach the model new facts here. The knowledge is already in. We only want to change its behaviour — from “continue the text” to “follow the instruction.” That behaviour change is called fine-tuning, and it is what turns a raw base model into the helpful assistant you actually chat with. We will cover what it is, why it is surprisingly cheap, and the clever LoRA trick that put fine-tuning within reach of almost anyone.

Knowledge it has, behaviour it lacks

Hold this distinction front and centre, because the rest of the article hangs on it.

A model has two separate things: knowledge (the facts and patterns baked into its weights during pretraining) and behaviour (how it responds to you). The base model has tremendous knowledge but unhelpful behaviour. Fine-tuning leaves the knowledge essentially intact and retrains the behaviour. So “continue training” is exactly what it is — we resume the Part 4 training loop, but starting from the already-trained dials, on a small, targeted new dataset, just long enough to nudge the behaviour where we want it.

Instruction tuning (SFT)

The main way to teach behaviour is the simplest thing imaginable: show the model examples of behaving well.

This is Supervised Fine-Tuning (SFT), also called instruction tuning. The training data is a set of (instruction → ideal response) pairs, written or curated by humans — an instruction, and the kind of answer you wish the model would give. You then run the same next-token training loop, but now the model learns to produce the ideal response when it sees the instruction.

an SFT training example (chat format)
[user]      What is the capital of France?
[assistant] The capital of France is Paris.

[user]      Summarise this in one line: 
[assistant] 

# train the model to produce the [assistant] turn given the [user] turn.
# do this over thousands of high-quality demonstrations.

Run over thousands of these demonstrations and the change is night-and-day: the model that used to continue your question with more questions now simply answers it. This single step — the first stage of the famous InstructGPT recipe (Ouyang et al., 2022) — is what converts a base model into an instruction-following assistant.

Why fine-tuning is cheap

Here is what surprises people: this whole step is cheap, especially next to pretraining.

Pretraining meant trillions of tokens, months of compute and millions of dollars (Part 9). SFT typically uses thousands to a few million examples and finishes in hours to days. Why the enormous gap? Because SFT is not teaching the model facts — those were poured in during pretraining. It is only teaching the model to access and present what it already knows, in a helpful format. It is surfacing a behaviour the model is already capable of, not building a new capability. That is why a relatively small amount of high-quality instruction data goes such a remarkably long way — and why good demonstrations matter more than sheer volume.

LoRA: a tiny adapter instead

The fix is to stop touching the big weights at all. This family of methods is called PEFT (Parameter-Efficient Fine-Tuning), and its most popular member is LoRA (Low-Rank Adaptation, Hu et al., 2021).

Take a big weight matrix W and freeze it — it never changes during fine-tuning. Alongside it, add a small adapter made of two skinny low-rank matrices, B and A, whose product B×A has the same shape as W but is built from far fewer numbers. The model now uses W + (B×A), and you train only B and A. Because those adapters are often under 1% of the full parameter count, fine-tuning needs a fraction of the memory and runs on a single modest GPU. As a bonus, the original knowledge in the frozen W is preserved untouched. (QLoRA goes further, compressing W with quantization — Part 20’s territory — to shrink the requirement even more.)

Two honest limits

Two honest limits, because fine-tuning is often reached for in the wrong situations.

First: fine-tuning is excellent for changing behaviour and style — following instructions, adopting a tone, a format, a domain’s conventions — but it is the wrong tool for adding new facts. If you want a model to know your company’s internal docs or today’s news, fine-tuning will memorise a little, unreliably, and often hallucinate the rest. The right approach is to hand the model the relevant documents at question-time, which is retrieval (RAG), coming in Part 17. Second: catastrophic forgetting — fine-tune too aggressively on a narrow task and the model can lose its broad, general ability. Keep it light. LoRA helps here too, since the frozen W keeps the original knowledge safe. The rule of thumb: fine-tune to change how it responds, retrieve to change what it knows, and go gently either way.

Fine-tuning — and the preference gap

Fine-tuning, in one view:

It is continued training on a small, targeted dataset; its key form, instruction tuning (SFT), turns a text-continuer into an instruction-follower using (instruction → answer) pairs; it is cheap because it surfaces behaviour rather than adding knowledge; LoRA makes it light enough for one GPU by training tiny adapters; and it is the right tool for behaviour and style, not for new facts.

But notice what SFT actually optimises for: it teaches the model to produce “a good answer” — one that looks like the human demonstrations. It does not, by itself, teach the model which of many plausible answers a person would most prefer — the most helpful, the most honest, the least harmful. Capturing that finer, fuzzier sense of human preference needs a different technique entirely: instead of imitating fixed answers, the model learns from humans comparing and ranking its outputs. That is reinforcement learning from human feedback — RLHF — the alignment step that gives an assistant its final polish, and it is Part 13.

Reactions

Related Articles