When Solvers Become Judges (and Fail): Why LLMs Still Struggle to Critique Reasoning
Correction is the expensive part. Answer generation is already the familiar magic trick. Give a model a problem, ask for a solution, and admire the fluent staircase of reasoning. Sometimes the staircase even reaches the right floor. That is nice. Investors clap. Product managers update the roadmap. Somewhere, a slide says “AI tutor,” “AI reviewer,” or “autonomous verification layer.” ...