Why is forgetting in machine learning harder than learning? A new paper offers a surprisingly elegant answer: it doesn’t have to be — if you rethink forgetting as a form of remembering in reverse.

In “Efficient Machine Unlearning via Influence Approximation,” Liu et al. turn a long-standing problem — how to make a machine learning model forget specific training data — into a tractable and efficient task by reframing it through the lens of incremental learning. The result is IAU, or Influence Approximation Unlearning: a method that replaces costly second-order computations with a clever gradient-based proxy inspired by cognitive science.

The Traditional Pain of Unlearning

When a user requests deletion of their data from an AI model — whether for compliance (e.g., GDPR) or security (e.g., poisoned data) — retraining from scratch is the gold standard. But it’s painfully expensive.

Approximate unlearning methods try to simulate the effect of retraining without actually doing it. The most principled of these rely on influence functions, which estimate how much a sample affects the model using the Hessian inverse. Mathematically sound, yes — but computationally prohibitive.

Influence unlearning:

$$\theta^*_{\{z^{-}\}} - \theta^* \approx \frac{1}{n} H^{-1}_{\theta^*} \nabla_{\theta} \ell(z^{-}, \theta^*)$$

Computing or even approximating $H^{-1}$ becomes infeasible at scale. That’s where IAU changes the game.

Turning Forgetting Into Counterfactual Memorizing

The authors propose a deceptively simple equivalence:

Removing a point $z^{-}$ has the same influence as adding an opposite point $z^{+}$ with gradient $-\nabla_{\theta} \ell(z^{-})$.

This means that instead of deleting a sample, you can apply gradient ascent (the inverse of descent) using the forgotten point. It’s like balancing out a memory by introducing an anti-memory.

The final parameter update rule becomes:

$$ \theta^*_{\text{unlearn}} = \theta^* + \eta \sum_{z_i \in D_f} \nabla_\theta \ell(z_i, \theta^*) $$

To improve robustness and avoid over-forgetting, IAU introduces two refinements:

  • Gradient Correction (GC): Applies descent on the remaining dataset $D_r$ to ensure model utility doesn’t degrade
  • Gradient Restriction (GR): A novel training-time regularizer $\ell_{GR} = \ell + \alpha | \nabla \ell |^2$ that ensures no data point has overwhelming influence

Together, the final update becomes:

$$ \theta^*_{\text{unlearn}} = \theta^* - \eta \left( \sum_{z_i \in D_r} \nabla_\theta \ell(z_i) - \sum_{z_j \in D_f} \nabla_\theta \ell(z_j) \right) $$

Does It Work? The Results Say Yes

Across four datasets (CIFAR10, SVHN, CIFAR100, Purchase100) and multiple architectures (MLP, LeNet5, ResNet18, VGG19), IAU achieved:

Method Speed Accuracy Loss (MU) Privacy Efficacy (UE)
Retraining ✘ Slow ✅ 0% loss ✅ 0% leak
Fisher ✘ Very Slow ✅ Good ✅ Good
Bad Teaching ✅ Fast ❌ High loss ❌ High leak
USGD ✅ Fast ⚠️ Moderate loss ✅ Good privacy
IAU (Ours) ✅ Fastest ✅ Low loss ✅ Good privacy

The method particularly shines under large-batch deletion and outlier removal, and ablation studies confirm each module (IA, GC, GR) contributes meaningfully.

Why This Matters

Most unlearning methods today are either too slow (e.g., retrain) or too brittle (e.g., label corruption). IAU offers a principled middle path — it’s fast, scalable, and mathematically interpretable. More importantly, it reframes a thorny technical issue into a problem we’ve already solved: how to learn efficiently.

For enterprise AI systems facing frequent deletion requests or regulatory pressure, IAU is a drop-in strategy that aligns technical efficiency with legal and ethical obligations. And for the research community, it opens a novel theoretical link between two historically separate areas: incremental learning and machine unlearning.


Cognaptus: Automate the Present, Incubate the Future