Curriculum Learning

The Lesson Plan Is the Product

TL;DR for operators AI learning is usually sold as a volume story: more data, more retrieval, more reasoning tokens, more reinforcement learning. Comforting. Also incomplete. Three recent papers make a more useful point. The model does not merely need more exposure. It needs a better lesson plan. One paper shows that a model can be given a more meaningful difficulty ranking for training examples, yet still fail to beat ordinary full-data training unless scoring and pacing are engineered together. Another shows that travel-planning agents become more factually grounded when forced into retrieval, but that the burden of grounding can damage instruction retention and preference satisfaction. A third shows that legal AI systems can be rewarded for correct prosecution outcomes without learning the underlying discrimination process that separates evidence insufficiency, statutory non-liability, discretionary non-prosecution, and prosecution. ...

No Easy A: Why AI Training Needs Hard-Case Routing

No Easy A: Why AI Training Needs Hard-Case Routing AI teams like to say they are “improving the model.” Very noble. Also conveniently vague. In practice, “improvement” usually means one of three things: collect more data, buy a larger model, or run another round of fine-tuning and hope the loss curve behaves like a polite employee. The two papers in this cluster suggest a less glamorous, more useful idea: the scarce resource is not only data or parameters. It is learning pressure. ...

When Noisy Data Talks Back: The Fragile Art of Learning Under Infinite Contamination

Bad data is not one problem. It is at least three problems wearing the same cheap trench coat. There is bad data that appears once and disappears. There is bad data that keeps appearing, but becomes rarer as the corpus grows. And there is bad data that settles in at a stable rate, like a permanent tenant with poor hygiene and legal representation. Business discussions about AI training data often compress these into one vague category called “noise”. Convenient, yes. Informative, no. ...

Train Long, Think Short: How Curriculum Learning Makes LLMs Think Smarter, Not Longer

TL;DR for operators The paper behind this article proposes Curriculum GRPO: a reinforcement-learning training method that starts a reasoning model with a larger token budget, then gradually shrinks that budget until the model learns to solve problems in shorter traces.1 The important point is not “ask the model to be brief.” We have tried that. It works roughly as well as asking a committee to be concise, which is to say: occasionally, under duress. The paper instead changes the training trajectory. The model is first allowed to explore longer reasoning paths, then is forced to compress successful strategies into a tighter token budget. ...