Planning

Recursive Minds: How ReCAP Turns LLMs into Self-Correcting Planners

A stuck workflow rarely looks intelligent. It looks like a support agent asking for the same invoice twice, a coding agent editing the wrong file for the third time, or an operations bot patiently repeating an invalid action because, apparently, persistence is cheaper than understanding. This is the unglamorous failure mode of many LLM agents. They do not collapse because they cannot produce a plan. They collapse because the plan becomes stale, buried, or locally contradicted by new observations. The agent remembers the latest step and forgets the job. ...

Plan>Then>Profit: Reinforcement Learning That Teaches LLMs to Outline Before They Think

Planning is usually the part of work everybody claims to value and nobody wants to inspect. The deck has a roadmap. The project has a strategy. The model has a chain of thought. Splendid. Now, does the plan actually make the execution better, or is it just theatre with bullet points? That is the useful question behind Plan Then Action: High-Level Planning Guidance Reinforcement Learning for LLM Reasoning, which introduces PTA-GRPO, a reinforcement-learning method that trains language models to generate an explicit analytic plan before detailed reasoning and then rewards the quality of that plan, not merely the final answer.1 ...

Paths, Not Parrots: When RL Makes LLMs Plan—and When It Doesn’t

A workflow agent usually looks clever right up to the moment one service is down, one permission changes, or one customer case arrives with the wrong sort of mess attached. Then the question becomes painfully simple: did the model learn a plan, or did it learn the usual route? That distinction is the centre of Benefits and Pitfalls of Reinforcement Learning for Language Model Planning: A Theoretical Perspective, an ICLR 2026 paper by Siwei Wang, Yifei Shen, Haoran Sun, Shi Feng, Shang-Hua Teng, Li Dong, Yaru Hao, and Wei Chen.1 The paper is not another victory lap for reinforcement learning. It is more useful than that. It asks what, mechanically, changes when a language model is trained for planning with reinforcement learning rather than supervised fine-tuning. ...

Failures, Taxonomized: How Multi‑Level Reflection Turns Agents Into Self‑Learners

Failure is usually treated as waste. The demo breaks, the agent apologises, someone adds a prompt patch, and everyone pretends the next retry will be more mature. Very enterprise. Very ceremonial. The SaMuLe paper makes a more useful claim: failed agent runs are not just embarrassing logs. They are the curriculum.1 More precisely, they are raw material for a structured reflection pipeline that turns messy trajectories into error taxonomies, cross-task lessons, and finally a small retrospective model trained to diagnose future failures. ...

Plan, Then Rewrite: Why Explicit Intent Wins in Agent Workflows

A user starts by asking for Italian restaurants, answers a few clarification questions, then changes their mind and asks for Mexican instead. A human hears the reversal. A planner may hear: pizza, pasta, Italian, Mexican, recommendations, and perhaps a vague invitation to overachieve. Naturally, it may then produce a plan with the confidence of a consultant who attended only half the meeting. ...

Plan, Act, Replan: When LLM Agents Run the Aisles

Retail planning usually fails in the hand-off. A sales team sets a target. Inventory planners translate it into stock positions. Procurement checks supplier feasibility. Operations discovers warehouse constraints. Someone exports a spreadsheet, someone else reworks the assumptions, and by the time the plan looks executable, the market has already wandered off with the innocence of a cat near an open laptop. ...

Plan, Don't Spam: The Goldilocks Rule for Test‑Time Compute

A busy agent is not necessarily a thinking agent. Anyone who has watched an LLM agent narrate every tiny move knows the feeling. It reviews the goal. It drafts a plan. It revises the plan. It reconsiders the revision. Then, with exquisite deliberation, it clicks the wrong button. The transcript looks intelligent; the behaviour looks like a consultant trapped in a revolving door. ...

Reflections in the Mirror Maze: Why LLM Reasoning Isn't Quite There Yet

TL;DR for operators Adding “reasoning” to an LLM agent is not the same as making it reason better. Wong et al. test four open-source models across dynamic SmartPlay tasks using a baseline prompt, reflection, reflection plus an Oracle that mutates heuristics, and reflection plus a Planner that simulates short future trajectories.1 The clean result is not “planning wins” or “bigger models win.” The result is more annoying, therefore more useful: the same scaffold can be a booster, a distraction, or a failure amplifier. ...