Cover image

Red Flag on the Track: Why LLMs Still Struggle with Real Algorithmic Reasoning

In the world of AI benchmarks, most roads lead to flashy competitions: solving coding puzzles, climbing Codeforces ratings, or passing Olympiad-level problems. But a new benchmark — FormulaOne — changes the race. It doesn’t ask, “Can you win a medal?” It asks, “Can you think like a researcher?” And the answer from today’s frontier LLMs? A resounding no. From Codeforces Champs to Research Rookies The authors of FormulaOne strip away the glitz of competitive programming and delve into something far more consequential: research-grade algorithmic problems grounded in Monadic Second-Order (MSO) logic over graphs. These aren’t out-of-distribution visual puzzles like ARC. They’re in-distribution, theoretically tractable problems designed with precision to demand multi-step symbolic reasoning, mathematical insight, and clean implementation. ...

July 18, 2025 · 4 min · Zelina