AI/LLM Security

Fine-Tuning

Part 12 of the AI/LLM mastery series — turning a knowledgeable base model into a helpful assistant by changing its behaviour, not its knowledge. Instruction tuning (SFT) with (instruction → response) pairs, why it is far cheaper than pretraining, why full fine-tuning is memory-heavy, the LoRA / PEFT trick (freeze the big weights, train a tiny adapter) and swappable adapters, plus what fine-tuning should NOT be used for.

#ai #Fine-Tuning #Fundamentals #Instruction Tuning #LLM #LoRA

AI/LLM Mastery · Part 12 of 20 — the base model knows a great deal but will not actually answer you. Fine-tuning changes its behaviour without touching its knowledge: instruction tuning to make it an assistant, and the LoRA trick that lets you do it on a single GPU.

A genius with no manners

After eleven parts we have a base model that is, frankly, a genius with no manners. Pretraining (Parts 9–11) poured an astonishing amount of knowledge into its dials — but its only learned behaviour is to continue text. Ask it “What is the capital of France?” and it might reply with a list of more capital-city questions, because that is the pattern it saw. The answer “Paris” is sitting right there inside it; the model just was never taught the behaviour of answering. This part is how we fix that.

The crucial mindset, and the thing most newcomers get backwards: we are not trying to teach the model new facts here. The knowledge is already in. We only want to change its behaviour — from “continue the text” to “follow the instruction.” That behaviour change is called fine-tuning, and it is what turns a raw base model into the helpful assistant you actually chat with. We will cover what it is, why it is surprisingly cheap, and the clever LoRA trick that put fine-tuning within reach of almost anyone.

Knowledge it has, behaviour it lacks

Hold this distinction front and centre, because the rest of the article hangs on it.

A model has two separate things: knowledge (the facts and patterns baked into its weights during pretraining) and behaviour (how it responds to you). The base model has tremendous knowledge but unhelpful behaviour. Fine-tuning leaves the knowledge essentially intact and retrains the behaviour. So “continue training” is exactly what it is — we resume the Part 4 training loop, but starting from the already-trained dials, on a small, targeted new dataset, just long enough to nudge the behaviour where we want it.

Instruction tuning (SFT)

The main way to teach behaviour is the simplest thing imaginable: show the model examples of behaving well.

This is Supervised Fine-Tuning (SFT), also called instruction tuning. The training data is a set of (instruction → ideal response) pairs, written or curated by humans — an instruction, and the kind of answer you wish the model would give. You then run the same next-token training loop, but now the model learns to produce the ideal response when it sees the instruction.

an SFT training example (chat format)

[user]      What is the capital of France?
[assistant] The capital of France is Paris.

[user]      Summarise this in one line: 
[assistant] 

# train the model to produce the [assistant] turn given the [user] turn.
# do this over thousands of high-quality demonstrations.

Run over thousands of these demonstrations and the change is night-and-day: the model that used to continue your question with more questions now simply answers it. This single step — the first stage of the famous InstructGPT recipe (Ouyang et al., 2022) — is what converts a base model into an instruction-following assistant.

Why fine-tuning is cheap

Here is what surprises people: this whole step is cheap, especially next to pretraining.

Pretraining meant trillions of tokens, months of compute and millions of dollars (Part 9). SFT typically uses thousands to a few million examples and finishes in hours to days. Why the enormous gap? Because SFT is not teaching the model facts — those were poured in during pretraining. It is only teaching the model to access and present what it already knows, in a helpful format. It is surfacing a behaviour the model is already capable of, not building a new capability. That is why a relatively small amount of high-quality instruction data goes such a remarkably long way — and why good demonstrations matter more than sheer volume.

The core intuition: pretraining stuffs in the knowledge (huge, expensive); fine-tuning surfaces the behaviour (small, cheap). You are polishing a vast existing capability, not creating one.

The catch: full fine-tuning is heavy

Cheap in data and time, yes — but there is a catch hidden in how you actually do the updating.

The naive approach, full fine-tuning, updates every one of the model’s billions of parameters. Two problems follow. First, memory: to train a weight you must also store its gradient and optimizer state, which together can come to several times the model’s own size in RAM — so a model that comfortably runs on your hardware may be impossible to fully fine-tune on it. Second, every fine-tuned task produces a whole new multi-gigabyte copy of the entire model; ten tasks means ten giant copies to store and serve. Updating billions of dials just to change the model’s manners is, frankly, wasteful — which is exactly the gap the next idea fills.

LoRA: a tiny adapter instead

The fix is to stop touching the big weights at all. This family of methods is called PEFT (Parameter-Efficient Fine-Tuning), and its most popular member is LoRA (Low-Rank Adaptation, Hu et al., 2021).

the LoRA idea

original weight matrix:  W            (huge — FROZEN, never trained)

add a small adapter:     B x A        (two skinny "low-rank" matrices)
                                       (B x A has W's shape, but far fewer numbers)

effective weight:        W + (B x A)  (only B and A are trained)

# B and A together are often < 1% of W's size.

Take a big weight matrix W and freeze it — it never changes during fine-tuning. Alongside it, add a small adapter made of two skinny low-rank matrices, B and A, whose product B×A has the same shape as W but is built from far fewer numbers. The model now uses W + (B×A), and you train only B and A. Because those adapters are often under 1% of the full parameter count, fine-tuning needs a fraction of the memory and runs on a single modest GPU. As a bonus, the original knowledge in the frozen W is preserved untouched. (QLoRA goes further, compressing W with quantization — Part 20’s territory — to shrink the requirement even more.)