Model Inversion & Membership Inference

These attacks do not make a model misbehave — they make it confess. By reading ordinary outputs (a label, a confidence score), an attacker can tell whether a specific person was in the training set, or reconstruct a recognisable likeness of a class the model learned. The common root cause is memorisation: a model that overfits even slightly leaks the data it was trained on.

Related Articles