Prompt Engineering

Part 16 of the AI/LLM mastery series — the cheapest, highest-leverage skill: no code, no GPU, just words. Why prompting works (it steers the next-token predictor), zero-shot vs few-shot (in-context learning), the system prompt, chain-of-thought for reasoning, structured prompting and output formats, and prompting specifically to reduce hallucination.

AI/LLM Mastery · Part 16 of 20 — the cheapest, highest-leverage skill in the whole field: no code, no GPU, just words. How the way you ASK an LLM dramatically changes what you get — zero-shot, few-shot, system prompts, chain-of-thought, and prompting to fight hallucination.

The cheapest superpower

We have spent fifteen parts taking the model apart. Now we put that understanding to work. The first tool of “Applied Mastery” is also the cheapest and most underrated: prompt engineering. There is no training, no fine-tuning, no infrastructure — you change the words you send, and the output changes, often dramatically. It is the single biggest lever most people never learn to pull.

And it is not folklore or magic incantations. Once you understand why prompting works — which we do straight away, using everything from Part 1 — the techniques become obvious. We will cover zero-shot and few-shot prompting, the system prompt, chain-of-thought, how to structure a prompt, and how to prompt specifically against the hallucination from Part 15. Real, copyable patterns, all explained from the ground up.

Why prompting works

Start with why a prompt does anything at all.

The model is a next-token predictor (Part 1), and it predicts conditioned on whatever text you give it — the prompt. Your prompt is the context that steers the entire continuation. Change the prompt and you change the probability distribution over what comes next, and so the output — with no retraining whatsoever. “Write about dogs” gets a generic ramble; the exact same model given “Write three fun facts about golden retrievers for a seven-year-old, one sentence each” produces something genuinely good. So prompting is best thought of as programming in plain English: you are not modifying the model, you are configuring it with words. Everything below is just patterns for doing that well.

Zero-shot: just ask

The simplest pattern is to just ask.

Zero-shot prompting gives no examples — you state the task directly: “Translate to French: Good morning.” Because the model was instruction-tuned (Part 12), it handles common, well-defined tasks like this well. It is the right default for straightforward requests. The weakness shows up with unusual tasks or strict output formats: with nothing to go on, the model has to guess exactly what you want, and guesses inconsistently. When that happens, the fix is to stop telling and start showing.

Few-shot: teach by example

Showing beats telling, via few-shot prompting.

Few-shot prompting puts a handful of worked examples in the prompt before the real input, and the model continues the pattern:

a few-shot prompt
Review: "Loved it!"          Sentiment: positive
Review: "Waste of money"     Sentiment: negative
Review: "It was okay"        Sentiment:  ___   (model fills this in)

The remarkable thing is that nothing was retrained and no weights changed — the model picked up the task purely from the examples sitting in the prompt. This is in-context learning, one of the emergent abilities that appeared with scale (Part 9) and the headline result of the GPT-3 paper (Brown et al., 2020). Few-shot nails format and unusual tasks far more reliably than zero-shot, because your examples define exactly what “good output” looks like — the examples are the spec.

The system prompt: set rules once

Above your individual messages sits a special instruction: the system prompt.

The system prompt sets the rules for the whole conversation — the assistant’s role, tone, and constraints — separate from each user turn, and it persists across turns:

a system prompt
[system]  You are a concise senior engineer. Answer in under 50 words.
          Use British spelling. If you are not sure, say so.
[user]    How do I reverse a list in Python?
[assistant] Use list[::-1] (a copy) or list.reverse() (in place).

It is how apps give an assistant a consistent character, format and safety policy. The system prompt is the highest-leverage place to set behaviour once — get it right and your individual prompts get much shorter, because the standing rules are already in place.

Chain-of-thought: room to think

For anything involving reasoning or maths, there is one technique worth more than all the others combined.

Demand a multi-step answer immediately and the model often blurts a plausible-but-wrong result. Add four words — “Let’s think step by step” — and it writes out the intermediate steps before answering, and gets it right far more often. This is chain-of-thought prompting (Wei et al., 2022; the zero-shot “let’s think step by step” trigger is Kojima et al., 2022). And the reason it works ties straight back to Part 8: the model computes one token at a time — it cannot reason silently “in its head.” Writing the steps out gives it the tokens to reason over; the thinking literally happens on the page. No steps, nowhere to think. (Today’s “reasoning models” bake this in, generating long internal chains of thought before answering — same idea, automated.)

Structure: role, task, format, constraints

Most “the model is dumb” complaints are really under-specified prompts. The cure is structure.

Give your prompt a clear skeleton — a Role, the Task, any Context, the Format you want, and the Constraints — and use delimiters to separate your instructions from the data you are feeding in:

a structured prompt
Role:        You are a senior data analyst.
Task:        Extract the product name and price from the text below.
Format:      Return JSON: {"name": string, "price": number}
Constraints: price must be a number with no currency symbol.
Text:
---
The new X1 headphones cost $40.
---

Being explicit about the output format — “return JSON with these keys,” “a three-bullet list,” “max 50 words” — is one of the most reliable tricks there is; ask for the shape and you will get it. The delimiters matter too: they stop the model confusing your data for instructions (and blunt some prompt-injection attacks, Part 19). The principle is simple: the more precisely you specify what you want, the better and more consistent the result.

Prompting against hallucination

Finally, prompting is your first and cheapest defence against the hallucination from Part 15.

Three short additions measurably cut confident nonsense. Permission to abstain: “If you are not sure, say ‘I don’t know’ rather than guessing” — without it, the model defaults to fabricating. Grounding: “Answer ONLY using the provided text; if it is not there, say so” — this forces the model to stick to facts you supply (and pairs perfectly with retrieval, Part 17). Show your work: “Show your reasoning and cite the source for each claim” — which surfaces errors so you can catch them. None of this makes the model infallible, but the difference between a prompt with these lines and one without is large and real.

The toolkit — and its ceiling

The prompting toolkit, in one view:

Be specific (role, task, format, constraints); show examples (few-shot / in-context learning); set standing rules in the system prompt; add “think step by step” for reasoning; pin down the exact output format; and explicitly allow “I don’t know” with citations to curb hallucination. None of it costs a thing, and together these turn the same model from frustrating to genuinely reliable.

But prompting has a hard ceiling. It is brittle — reword a prompt and results can shift (Part 15) — and crucially, no prompt can give the model facts it never learned. You can beg it to “only use the provided text” all you like, but if you have not actually provided that text, it has nothing to ground on. The biggest fix for missing or fresh knowledge is not a cleverer prompt at all: it is fetching the right documents and putting them into the prompt at question-time. That is Retrieval-Augmented Generation — RAG — and it is Part 17.

Reactions

Related Articles