Plans, Tokens, and Turing Dreams: Why LLMs Still Can’t Out-Plan a 15-Year-Old Classical Planner
TL;DR for operators A new benchmark does not say that LLMs are hopeless at planning. That would be too easy, and also false. It says something more useful: frontier models are now strong enough to solve many formal planning tasks, but their competence still weakens when the task stops giving them semantically meaningful labels.1 ...