Curriculum Learning

Train Long, Think Short: How Curriculum Learning Makes LLMs Think Smarter, Not Longer

When it comes to reasoning, bigger isn’t always better. Large language models (LLMs) often produce unnecessarily long chains of thought, burning through tokens — and budgets — even for simple problems. While fixed token limits during training can force brevity, they also rob models of the chance to first explore and then compress their reasoning. A new study, Train Long, Think Short, proposes a smarter path: curriculum learning for length control. Instead of a one-size-fits-all cap, the model starts with a generous token budget, learns robust reasoning strategies, and then gradually adapts to shorter limits over time. The result is a model that solves complex tasks with fewer tokens, without losing accuracy. ...

From GUI Novice to Digital Native: How SEAgent Teaches Itself Software Autonomously

If you’ve ever tried to automate your own software workflows using AI, you’ll know the hard part isn’t reasoning — it’s clicking the right button in a sea of ambiguous icons, drop-downs, and obscure UIs. For agents tasked with navigating GUIs like humans do, the real challenge isn’t logic — it’s context. Enter SEAgent: a self-evolving computer-use agent that doesn’t just learn to operate software — it teaches itself how to learn, using nothing but screenshots, feedback from its own past mistakes, and a clever curriculum. ...

Thinking in Circles: How Self-Questioning LLMs Learn Without Labels

What if an LLM could learn not by reading more, but by thinking harder? That’s the radical premise behind Self-Questioning Language Models (SQLM), a framework that transforms large language models from passive learners into active generators of their own training data. No curated datasets. No labeled answers. Just a prompt — and a model that gets smarter by challenging itself. From Self-Play in Robotics to Reasoning in Language The inspiration for SQLM comes from asymmetric self-play, a technique used in robotics where one agent proposes tasks and another learns to solve them. Here, that paradigm is adapted to LLMs: ...

Mirror, Mirror in the Model: How MLLMs Learn from Their Own Mistakes

When multimodal large language models (MLLMs) like Gemini or Janus are asked to generate an image and then assess whether that image matches a prompt, you’d expect agreement. But a new study shows this harmony is often missing: the model’s own understanding branch disagrees with what its generation branch creates. This phenomenon—called self-contradiction—isn’t just an embarrassing quirk. As it turns out, it may be the most valuable feedback signal MLLMs have. ...