Cover image

Judge Math-Not by Its Parser

Opening — Why this matters now The AI industry has discovered a wonderfully pedestrian way to misread progress: build models that can solve harder math problems, then grade them with evaluators that panic when 2040 minutes is not written as 34 hours. That is not a joke. It is the central irritation behind “Rethinking Math Reasoning Evaluation: A Robust LLM-as-a-Judge Framework Beyond Symbolic Rigidity”, an arXiv paper that examines how mathematical reasoning benchmarks can be distorted by rigid symbolic verification.1 ...

April 27, 2026 · 12 min · Zelina
Cover image

When AI Can Solve But Can't Search: The MathNet Equation

Opening — Why this matters now The AI industry enjoys announcing that models now perform at medal level on Olympiad mathematics. Impressive headlines. Elegant demos. Much applause. Then MATHNET arrives with the social grace of an auditor. This new benchmark shows that while leading models can often solve difficult mathematics, they are far worse at finding related problems, recognizing structural equivalence, or reliably using retrieved examples to improve reasoning. In practical terms: your AI intern may ace the exam, then fail to locate the right binder. ...

April 23, 2026 · 4 min · Zelina
Cover image

Long Thoughts, Short Bills: Distilling Mathematical Reasoning at Scale

Opening — Why this matters now Large language models can solve math problems. The more interesting question in 2025 is whether they can learn how to reason, at scale, across contexts that are long, messy, and computationally expensive. Most math datasets answer the first question. Nemotron-Math answers the second — and does so with a surprisingly pragmatic eye on cost. ...

December 18, 2025 · 4 min · Zelina
Cover image

The Problem with Problems: Why LLMs Still Don’t Know What’s Interesting

Opening — Why this matters now In an age when AI can outscore most humans in the International Mathematical Olympiad, a subtler question has emerged: can machines care about what they solve? The new study A Matter of Interest (Mishra et al., 2025) explores this psychological fault line—between mechanical brilliance and genuine curiosity. If future AI partners are to co‑invent mathematics, not just compute it, they must first learn what humans deem worth inventing. ...

November 12, 2025 · 4 min · Zelina