Training Data Poisoning โ The Hidden Threat
Data poisoning is an attack on the training process itself. An adversary introduces carefully crafted malicious examples into your training data, causing the resulting model to behave in a way the attacker controls โ while appearing completely normal to standard evaluation.
How Poisoning Works
Training data poisoning can take two forms:
- โIntegrity attacks โ Corrupting existing training examples to cause misclassification on specific inputs. The model learns a "backdoor" โ it performs normally on standard inputs but fails in a predictable way when it encounters a trigger the attacker controls.
- โAvailability attacks โ Injecting enough noise into training data to degrade overall model performance. Less targeted but often easier to execute.
Who Controls Your Training Data?
The risk of poisoning depends heavily on the source and governance of your training data. Ask yourself:
- โIs your training data collected from user submissions, web scraping, or third-party sources where adversaries could insert content?
- โDo you have a validation pipeline that checks training data quality and integrity before it enters the training process?
- โCan you audit your training data and trace each record's origin?
- โIf you discover poisoned data, can you identify and remove it, retrain cleanly, and verify the remediation worked?
Models trained on user-generated content or open web data are particularly vulnerable. Any public-facing input mechanism โ reviews, ratings, search queries, image uploads โ is a potential poisoning vector if that data feeds back into model training.
Defending Against Poisoning
- โData provenance tracking โ Know where every training record came from
- โAnomaly detection on training batches โ Flag statistical outliers before they enter training
- โHoldout validation on clean data โ Maintain a trusted, locked validation set separate from training pipelines
- โCanary inputs โ Synthetic test inputs you know the correct answer to; if the model fails on canaries, something is wrong
- โStaged rollout with monitoring โ Never deploy a newly trained model directly to full production without a period of monitored operation
Data governance is AI security governance. Every control you apply to production data โ access control, audit logging, integrity checks โ should also apply to your training data pipelines. Most organisations have a large gap here.
