LLM Memorization

TL;DR for operators Deletion is simple in a database. It is not simple in a neural network that has already used the deleted record to improve its internal machinery. That is the unpleasant little invoice this paper presents. Gaurav R. Ghosal, Pratyush Maini, and Aditi Raghunathan study why repeated natural text is hard to remove from language models after training, then propose MemSinks, a training-time mechanism designed to make memorization easier to isolate later.1 The important shift is not “better pruning.” It is architectural accounting. Instead of hoping that memorized text happens to live in a few removable neurons, MemSinks gives repeated sequences a controlled place to accumulate memorization during training. ...

TL;DR for operators Memorization audits usually start with the wrong question: “Which individual text snippets look memorized?” This paper suggests a better first diagnostic: group many snippets by how closely the model reproduces them, then measure the entropy of the token distribution inside each group.1 The result is an empirical pattern the authors call Entropy–Memorization Linearity. In plain English: when training examples are pooled by edit-distance score, their set-level entropy forms a strong linear relationship with how closely the model reproduces them. Since the paper’s “memorization score” is an edit distance, lower score means stronger verbatim reproduction; higher score means the generated continuation is farther from the ground truth. ...

LLM Memorization

The Sink That Remembers: Solving LLM Memorization Without Forgetting Everything Else

What LLMs Remember—and Why: Unpacking the Entropy-Memorization Law