TL;DR

Classical planners crack under scale. You can rescue them with LLMs in two ways: (1) Inspire the next action, or (2) Predict an intermediate state and split the search. On diverse benchmarks (Blocks, Logistics, Depot, Mystery), the Predict route generally solves more cases with fewer LLM calls, except when domain semantics are opaque. For enterprise automation, this points to a practical recipe: decompose → predict key waypoints → verify with a trusted solver—and only fall back to “inspire” when your domain model is thin.


Why this matters for business automation

Executives love the promise of “LLM agents that plan.” Practitioners know the pain: state-space explosion. As entities, constraints, and steps grow, even great planners time out. The paper we analyze bridges the gap: keep the deterministic rigor of classical planners and inject LLM intuition only where it shrinks search.

Think warehouse routing, field-service dispatch, or multi-stage approvals: exact rules exist, but the search is huge. Instead of asking an LLM to spit out the whole plan (risky), use it to either skip a few levels ahead or split the mountain into two hills.


The two paradigms in plain English

Paradigm What the LLM does Mental model Strength Risk When to use
LLM4Inspire Picks the next action from a verified list of applicable actions A seasoned dispatcher nudging the next move Simple to bolt on; reduces branching temporarily Can chase clever-looking but myopic moves; repetitive calls When the domain model is shallow or fast-changing, and you just need helpful nudges
LLM4Predict Proposes a midpoint state (a small set of predicates) that lies between now and the goal Turning a marathon into two half-marathons Big, exponential wins by cutting depth; fewer LLM calls overall If the midpoint is ill-chosen, you can disturb prior progress When you have a reliable domain model and want scale + reliability

Key system pieces: a problem disassembler that builds a Directed Acyclic Dependency Graph (DADG) of sub-goals; an instance factory that spins up PDDL sub-instances; a solver (Fast Downward) that handles the heavy lifting; and an LLM module that either Inspires or Predicts.


The crucial insight: split beats skip (most of the time)

The search cost for depth k is roughly O(b^k). If a good midpoint halves the depth (k/2 each side), the total burden becomes O(b^{k/2}) + O(b^{k/2}), a dramatic reduction. In experiments across standard planning domains:

  • Success rates: Predict > Inspire » vanilla, with Predict hitting ~95–100% on easier-to-structure domains (Blocks, Logistics) and still leading in Depot.
  • Efficiency: Predict needed fewer LLM calls and less solver time for the same solved set—evidence that the midpoint is doing real work rather than just sprinkling hints.
  • Edge case (Mystery domain): When names are randomized (no semantics to latch onto), both Inspire and Predict lose some magic. Classical solvers are fine; LLMs struggle to infer constraints from gibberish labels.

Takeaway: If you can encode domain rules (even roughly), ask the LLM to predict tiny waypoint states and let the solver prove them. Don’t outsource the whole plan.


What this means for Cognaptus-style automations

If you’re building enterprise orchestrations (RPA + constraints + approvals), here’s a robust pattern we recommend:

  1. Decompose goals into a DADG; topologically sort dependent sub-goals.
  2. For sub-problems that bog down, Predict a small waypoint state (1–3 predicates).
  3. Verify with your deterministic planner; never execute unverified LLM output.
  4. Fallback to Inspire for local, tactical nudges when models are incomplete or data are live and noisy.
  5. Log and learn: Track which predicted waypoints consistently help; promote them into reusable domain heuristics over time.

This pattern gives you auditability (plans are solver-validated), cost control (fewer LLM calls), and stability (you can swap LLMs without rewriting the planner).


Implementation sketch (production-safe)

  • Contract: The LLM never outputs free-text; it emits either (a) a single action from the already-applicable list, or (b) a JSON array of 1–3 predicates to define the waypoint.
  • Guards: Hard-check that predicted waypoints are not equal to the initial or goal state; forbid predicates that violate type or static constraints.
  • Cache: Memoize successful waypoints keyed by goal-pattern to avoid repeat calls.
  • Budgeting: Cap to N LLM retries per stubborn sub-goal; otherwise mark for human-in-the-loop or slower batch processing.

Where this fits in the current AI planning debate

Two myths persist: (1) “LLMs will replace planners,” and (2) “LLMs can’t plan, full stop.” The more useful view is LLM-modulo planning: a classic solver supplies guarantees; the LLM supplies search compression. This paper’s Predict-vs-Inspire framing gives practitioners a knob to turn from “more intuitive but noisy” to “more structured and scalable.”


Practical checklist

  • Do you have a declarative domain model (even partial)? → Favor Predict.
  • Is your domain fast-changing with weak schemas? → Start with Inspire but validate aggressively.
  • Are you hitting timeouts at high depth? → Insert waypoints near the geometric midpoint of the remaining plan.
  • Need compliance trails? → Keep every LLM I/O, the solver’s proofs, and the DADG-derived decomposition for audits.

Limitations to watch

  • Waypoint myopia: Poor midpoints can undo prior sub-goals. Mitigate with local-invariance checks (don’t violate locked predicates).
  • Opaque domains: If labels are meaningless, pre-train a lightweight semantic mapper (or avoid LLM guidance altogether).
  • Cost drift: Even with fewer calls, latency spikes can occur; batch and cache.

Final word

For real operations, Predict (split) should be your default. Inspire (skip) is a useful auxiliary—especially when you lack clean schemas. Either way, the winning architecture is LLM-assisted, solver-verified, decomposition-first.


Footer: Cognaptus: Automate the Present, Incubate the Future