AI/LLM Security Members Only

Backdoored Models

A backdoored model is correct on everything you test — and wrong exactly when the attacker wants. A secret trigger, a pixel patch or a phrase, flips its behaviour, while clean accuracy stays perfect so every benchmark passes. The malice lives in the weights, not in any code or file format, which is what makes it so hard to find — and, as sleeper-agent research showed, hard to remove.

Members Only Content

This article is exclusively available to registered members of LazyHackers. Login or subscribe to read.

Published	Jun 1, 2026
Updated	Jul 15, 2026
Reading time	14 min
Access	subscribers

Members Only Content

Related Articles