Topology Trouble: Why Even Frontier LLMs Still Get Lost in a Grid
Grid. It looks like the friendliest possible structure. Rows, columns, symbols, rules. No blurry photos, no social nuance, no awkward customer email written at 1:13 a.m. Just a small board and a set of constraints. Naturally, this is where modern reasoning models still manage to embarrass themselves. The paper introducing TopoBench studies a deceptively simple question: can frontier large language models solve topology-heavy grid puzzles where the answer depends on connectivity, loop closure, symmetry, visibility, and state consistency?1 The answer is not “never.” That would be too easy. The answer is more annoying: models often understand enough to start correctly, reason long enough to sound competent, and then lose the structure that makes the solution valid. ...