1.2 · Adversarial Attacks & Model Manipulation

Defending Against Adversarial Attacks

⏱ 11 minCourse 01

There is no single control that neutralises all adversarial threats. Effective defence requires a layered approach — applied at training time, at inference time, and at the architectural level.

Training-Time Defences

◆Adversarial Training — Including adversarially crafted examples in your training data so the model learns to handle them correctly. The most effective known defence against evasion attacks, but computationally expensive.
◆Differential Privacy — Adding carefully calibrated noise during training so the model cannot memorise individual records. Directly mitigates model inversion and membership inference attacks.
◆Data Provenance Controls — Tracking and validating the origin of all training data. Poisoned data cannot affect a model you haven't trained it on.

Inference-Time Defences

◆Input validation and sanitisation — Statistical checks that flag inputs deviating significantly from the expected distribution. Slows evasion attacks.
◆Rate limiting and query monitoring — Detecting abnormally high query volumes from single sources, which is the signature of model extraction and inversion attempts.
◆Output perturbation — Adding small, controlled amounts of noise to model outputs. Makes model extraction significantly harder — the attacker's replica will be degraded.
◆Confidence thresholding — Rejecting model outputs where confidence is below a threshold. Adversarial inputs often cause models to output low-confidence predictions.

Architectural Defences

Beyond individual controls, the architecture of how your AI systems are exposed matters enormously.

◆API access control — Authenticated, rate-limited access to any model with commercial value. Anonymous or unauthenticated access enables extraction attacks.
◆Model versioning — The ability to roll back to a previous model version quickly if an attack is detected.
◆Ensemble methods — Using multiple models with different architectures. An input crafted to fool one model is less likely to fool all of them.

✓ Your Immediate Priority

Audit every AI model your organisation has exposed via API or web interface. For each one, ask: Is access authenticated? Is query volume monitored? Is there a rate limit? These three controls, applied consistently, reduce your model extraction risk by over 90%.