When large language models (LLMs) memorize repeated content during training—be it a phone number, a copyrighted paragraph, or a user’s personal story—the implications go beyond benign repetition. They touch the very core of AI safety, privacy, and trust. And yet, removing this memorized content after training has proven to be a devil’s bargain: anything you subtract tends to weaken the model’s overall capabilities.

In their recent ICML 2025 paper, Ghosal et al. propose an elegant reframing of this problem. Rather than performing painful post-hoc surgery on a trained model, they suggest we prepare the model from the outset to isolate memorization into removable compartments—which they call Memorization Sinks (MemSinks).


Why Is Memorization So Hard to Unlearn?

The issue begins with what the authors call mechanistic entanglement: the very neurons that help an LLM generalize also—when faced with repeated sequences—take on the burden of rote memorization. This is especially true for natural sequences (e.g. passages from books or real-world documents) that resemble the broader training corpus.

🔍 Key finding: Attempts to drop or retrain neurons responsible for memorization often damage the model’s general ability—because there’s no clean separation between memory and reasoning.

The problem is not just practical. The paper proves, under a linearized model, that gradient descent is biased toward entangled solutions, meaning that repeated sequences tend to be stored using the same neurons involved in general semantic understanding.


Past Fixes Fall Short

Researchers have tried two main routes to unlearning:

Approach Method Drawback
Post-hoc pruning Identify and zero out neurons after training Hurts model performance due to entanglement
Gradient masking During training, restrict repeated data to specific neurons Starves generalization and still causes co-adaptation

Both approaches fail to fully decouple memorization from generalization. Even when training gradients are neatly partitioned, shared neurons adapt based on memorization neurons, creating fragile dependencies.


Enter: Memorization Sinks

MemSinks sidestep this tradeoff by leveraging the different dynamics of memorization vs. generalization during training:

  • Generalization amplifies across diverse data.
  • Memorization is local and interferes across examples.

By assigning each repeated sequence a unique set of memorization sink neurons, activated only when that sequence appears, the model avoids interference. These neurons soak up memorization like a sponge—hence, a “sink.”

✂️ When it’s time to unlearn a sequence, you simply drop out the associated neurons. General capabilities remain intact.

The MemSink approach uses sequence-dependent dropout: different sink neurons are activated based on a hashed ID of each sequence. Crucially, the shared neurons stay active for all examples, allowing the model to learn general features from repeated text without embedding the text itself in general components.


Results That Stick

In experiments on both small (TinyStories) and mid-scale (SlimPajama + SmolLM 1.7B) models:

  • MemSinks preserve generalization: validation loss stays close to that of standard training.
  • MemSinks reduce memorization: dropping the memorization neurons increases loss on repeated sequences by over 50%.
  • Robust to noise: MemSinks still work with up to 10% sequence ID error.
  • Scales well: Larger models benefit even more from this separation.

These results mark a rare moment in LLM training research: a clean disentanglement of two conflicting objectives, backed by both theory and practice.


Implications: Not Just for Privacy

The most immediate use case is obvious: data privacy and copyright compliance. With MemSinks, sensitive or litigated content could be removed reliably without full retraining.

But broader implications are emerging:

  • Editable AI: What if we could remove or replace specific domains (e.g., outdated knowledge) from a model?
  • Modular personalization: Could user-specific content be localized for selective deployment or deletion?
  • Regulatory auditing: Models trained with MemSinks might offer more transparent paths to demonstrate “right to be forgotten” compliance.

In a future where language models are both powerful and accountable, structured memory control may be just as important as model size.


The Sink as a Paradigm

MemSinks offer a shift in thinking: don’t just patch memorization problems after the fact. Architect the model to respect boundaries between what it remembers and what it generalizes.

This idea feels like the beginning of a broader design philosophy—where model memory is not a mysterious blob, but a controlled and targeted mechanism. Perhaps someday, LLMs will come with labeled compartments for facts, patterns, emotions, and yes, memories—with each one fully optional.

Until then, MemSinks give us the next best thing: a drain that remembers, and forgets, on command.


Cognaptus: Automate the Present, Incubate the Future.