Training Data Extraction from LLMs

Large language models do not only generalise — they memorise, and memorised text can be pulled back out word for word. Feed the right prefix and the model completes a phone number it saw in the crawl; make it repeat a word forever and it can fall out of chat mode and dump verbatim training data. Here is why memorisation happens, the divergence trick that triggered it on a live model, and why deduplication is the main defence.

Related Articles