AI Optimization

Train Long, Think Short: How Curriculum Learning Makes LLMs Think Smarter, Not Longer

When it comes to reasoning, bigger isn’t always better. Large language models (LLMs) often produce unnecessarily long chains of thought, burning through tokens — and budgets — even for simple problems. While fixed token limits during training can force brevity, they also rob models of the chance to first explore and then compress their reasoning. A new study, Train Long, Think Short, proposes a smarter path: curriculum learning for length control. Instead of a one-size-fits-all cap, the model starts with a generous token budget, learns robust reasoning strategies, and then gradually adapts to shorter limits over time. The result is a model that solves complex tasks with fewer tokens, without losing accuracy. ...

Graft and Go: How Knowledge Grafting Shrinks AI Without Shrinking Its Brain

If you’ve ever tried to run a powerful AI model on a modest device—say, a drone, a farm robot, or even a Raspberry Pi—you’ve likely hit the wall of hardware limitations. Today’s most accurate models are big, bloated, and brittle when it comes to efficiency. Enter knowledge grafting, a refreshingly biological metaphor for a novel compression technique that doesn’t just trim the fat—it transfers the muscle. Rethinking Compression: Not What to Cut, But What to Keep Traditional model optimization methods—quantization, pruning, and distillation—all try to make the best of a difficult trade-off: shrinking the model while limiting the damage to performance. These methods often fall short, especially when you push compression past 5–6x. ...

Fine-Tuning Isn’t Just Supervised: Why SFT Is Really RL in Disguise

In the arms race to align large language models (LLMs), supervised fine-tuning (SFT) and reinforcement learning (RL) are often painted as competing paradigms. SFT is praised for its stability and simplicity; RL is heralded for its theoretical soundness and alignment fidelity. But what if this dichotomy is an illusion? A recent preprint from Chongli Qin and Jost Tobias Springenberg makes a bold and elegant claim: SFT on curated data is not merely supervised learning—it is actually optimizing a lower bound on the RL objective. ...

Train of Thought: How Long-Haul RL Unlocks LLM Reasoning Diversity

In the race to make Large Language Models (LLMs) reason like humans—or better—most researchers obsess over one thing: prompting. Chain-of-thoughts, few-shot demos, scratchpads, tools. But a new study from NVIDIA suggests something even more fundamental: it’s not just how you prompt them—it’s how long you train them. Their paper, Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training, explores how stretching reinforcement learning (RL) over time unlocks broader, more stable, and more versatile reasoning in LLMs. This isn’t just about incremental gains—it’s about escaping reasoning ruts. ...

Reasoning on a Sliding Scale: Why One Size Doesn't Fit All in CoT

The Chain-of-Thought (CoT) paradigm has become a cornerstone in improving the reasoning capabilities of large language models (LLMs). But as CoT matures, one question looms larger: Does every problem really need an elaborate chain? In this article, we dive into a new method called AdaR1, which rethinks the CoT strategy by asking not only how to reason—but how much. ...