Rules of Engagement: Why LLMs Need Logic to Plan
Rules of Engagement: Why LLMs Need Logic to Plan When it comes to language generation, large language models (LLMs) like GPT-4o are top of the class. But ask them to reason through a complex plan — such as reorganizing a logistics network or optimizing staff scheduling — and their performance becomes unreliable. That’s the central finding from ACPBench Hard (Kokel et al., 2025), a new benchmark from IBM Research that tests unrestrained reasoning about action, change, and planning. ...