Mirror, Mirror on the Agent: Teaching LLMs to Judge Their Own Actions
Opening — Why this matters now The current wave of AI agents promises something ambitious: systems that plan, act, evaluate outcomes, and adapt. In theory, they resemble junior analysts—observing a situation, choosing an action, and refining their judgment over time. In practice, however, many so‑called “agents” are little more than skilled imitators. Most agent training pipelines rely on imitation learning: the model copies actions demonstrated by experts. This produces competent behavior, but it hides a critical weakness. The model learns what to do, but rarely learns why one action is better than another. Without that comparative judgment, agents struggle to reflect on mistakes or adapt to unfamiliar situations. ...