Model Cannibalism: When LLMs Learn From Their Own Echo
Opening — Why this matters now Synthetic data is no longer a contingency plan; it is the backbone of modern model iteration. As access to clean, human-authored data narrows—due to cost, licensing, or sheer exhaustion—LLMs increasingly learn from text generated by earlier versions of themselves. On paper, this looks efficient. In practice, it creates something more fragile: a closed feedback system where bias, preference, and quality quietly drift over time. ...