Prompt Injection — Direct and Indirect

Every LLM concatenates trusted instructions and untrusted data into one flat token stream with no parser and no enforced boundary. Prompt injection exploits that seam. Direct injection: the user overrides the system prompt. Indirect injection: a payload planted in a web page, email, PDF, or tool output hijacks the model when it reads the content — the dangerous one for agents. Real cases, the markdown exfil trick, and why realistic mitigations are all downstream of the model, not inside it.

Related Articles