Model Cannibalism: When LLMs Learn From Their Own Echo

Opening — Why this matters now

Synthetic data is no longer a contingency plan; it is the backbone of modern model iteration. As access to clean, human-authored data narrows—due to cost, licensing, or sheer exhaustion—LLMs increasingly learn from text generated by earlier versions of themselves. On paper, this looks efficient. In practice, it creates something more fragile: a closed feedback system where bias, preference, and quality quietly drift over time.

This paper introduces a precise name for that phenomenon: the Self-Consuming Performative Loop (SCPL). And more importantly, it shows that the danger is not just model collapse, but something subtler and arguably harder to govern—bias evolution driven by feedback itself.

Background — From self-training to performativity

Prior work has already warned us that recursively training generative models on their own outputs degrades diversity and accuracy. But most of that literature assumes a static world: fixed prompts, fixed ratios, and no user response shaping the data.

Real systems are messier. Models influence users, users influence data, and data reshapes the next model. This is the domain of performative prediction, where deployment changes the very distribution a model learns from next.

The authors combine these two ideas—self-consuming training and performative feedback—into a single framework that mirrors production reality more closely than prior benchmarks.

Analysis — What the paper actually does

The SCPL framework studies iterative training under controlled feedback dynamics. Two training regimes are examined:

Performative Retraining — each generation retrains from a base model using newly generated synthetic data.
Performative Incremental Fine-tuning — each generation fine-tunes directly on top of the previous one (the more realistic industrial case).

Crucially, the data distribution is allowed to shift over time. If a model performs better for one group, that group contributes more data in the next iteration. Less satisfied groups slowly disappear from the training signal.

The authors test this loop across three tasks:

News continuation (political preference bias)
Preference dissection (creative vs non-creative inclination)
Mathematical problem solving (easy vs hard problems, measuring disparate performance)

Bias is measured along two axes:

Bias Type	What it captures
Preference Bias	Systematic favoring of one group’s style or viewpoint
Disparate Bias	Performance gaps between advantaged and disadvantaged groups

Findings — The uncomfortable patterns

The results are remarkably consistent across models and tasks.

1. Preference bias amplifies — fast

Under SCPL, models increasingly favor the already advantaged group. This effect is strongest in incremental fine-tuning loops, where yesterday’s bias becomes today’s initialization.

Even when generation quality declines, preference bias keeps rising. In other words, the model becomes more opinionated while becoming worse.

2. Disparate bias shrinks — misleadingly

At first glance, performance gaps between groups narrow over time. This looks like fairness improvement. It isn’t.

What actually happens is performance collapses for everyone, especially on harder or less frequent cases. The gap closes because both sides sink.

3. Accumulation helps, but doesn’t cure

Reusing past data slows bias growth and quality decay, but does not reverse them. Accumulation is a brake, not a steering wheel.

4. Model scale matters

Larger models are more sensitive to performative distribution shifts. Dynamic sampling that stabilizes small models can destabilize large ones.

Mitigation — Can this be controlled?

The paper proposes a reward-based rejection and reweighting strategy:

Generate multiple candidate outputs per prompt
Score them using modular reward rules (quality, alignment, task consistency)
Oversample and reweight data from disadvantaged groups

This approach outperforms naive rejection sampling and slows preference bias amplification—especially in news-generation tasks.

Still, the authors are careful: poorly designed reward rules can introduce new biases. Governance moves upstream.

Implications — What this means for real systems

Three takeaways stand out:

Bias is dynamic, not static — auditing a single checkpoint is meaningless if the training loop itself is unstable.
Incremental fine-tuning is riskier than retraining — but also unavoidable in practice.
Fairness metrics can lie — shrinking disparities may signal shared degradation, not equity.

For enterprises running continuous model updates, SCPL isn’t a theoretical curiosity. It is the default failure mode unless actively managed.

Conclusion — The loop is the product

This paper reframes LLM bias as a system property, not a dataset flaw. Once models consume their own outputs under feedback, the loop itself becomes the object that must be governed.

Ignoring that loop doesn’t make systems neutral—it just lets bias compound quietly, iteration by iteration.

Cognaptus: Automate the Present, Incubate the Future.

Opening — Why this matters now#

Background — From self-training to performativity#

Analysis — What the paper actually does#

Findings — The uncomfortable patterns#

1. Preference bias amplifies — fast#

2. Disparate bias shrinks — misleadingly#

3. Accumulation helps, but doesn’t cure#

4. Model scale matters#

Mitigation — Can this be controlled?#

Implications — What this means for real systems#

Conclusion — The loop is the product#