Opening — Why This Matters Now

Enterprises are discovering a quiet truth about AI planning systems: generating a plan is the easy part. Getting humans to trust it, refine it, and align it with real-world preferences? That’s the harder game.

From supply chain orchestration to workforce scheduling and mission planning, organizations increasingly rely on automated planners. Yet most deployments still treat explanation as a static afterthought — a tooltip, a log file, perhaps a constraint violation message. In reality, planning is rarely a one-shot optimization problem. It is an iterative negotiation between human intent and computational feasibility.

The paper “Exploring Plan Space through Conversation” reframes this dynamic. Instead of asking whether large language models (LLMs) can replace symbolic planners (they cannot — and the authors are refreshingly explicit about this), it asks a sharper question:

What if LLMs became conversational mediators between humans and formal planning engines?

The result is not an “LLM planner.” It is something more disciplined — and more useful.


Background — From Static Explanations to Conversational Planning

Classical AI planning operates over formal structures: predicates, objects, states, actions, goals. A task is defined as:

$$ \tau = \langle P, O, A, I, G \rangle $$

Where predicates, objects, actions, initial state, and goals define the search space. In oversubscription planning, not all goals can be satisfied simultaneously. Trade-offs are inevitable.

Traditionally, explanation in planning systems falls into two categories:

  1. Model reconciliation — explaining why the model behaves as it does.
  2. Conflict analysis — identifying incompatible goal sets.

This paper focuses on the second, using two formal constructs:

Concept Meaning Business Interpretation
MUS (Minimal Unsolvable Subset) A smallest set of goals that cannot be satisfied together A hard conflict cluster
MCS (Minimal Correction Set) A smallest set of goals whose removal restores feasibility A minimal sacrifice option

If your planning task is infeasible, the explanation isn’t mystical. It is structural.

But here’s the operational gap: humans do not think in MUS/MCS.

They ask:

  • “Why can’t we do both?”
  • “What if I insist on this constraint?”
  • “What’s the cheapest way to fix this?”

Symbolic planners do not converse. LLMs do.

The authors’ key insight is architectural separation: let planners compute; let LLMs translate.


Architecture — A Multi-Agent Conversational Layer Over Planning

The framework introduces a modular, multi-agent LLM system wrapped around a classical planner.

At a high level:

User (Natural Language)LLM TranslatorsExplanation Framework (EFCC)LLM Explanation TranslatorUser

Rather than one monolithic LLM, the system assigns distinct roles:

Agent Function Risk Mitigated
Question Translator Classifies question type & extracts goal references Misinterpretation
Goal Translator Converts natural language goals to LTLf formulas Formal inconsistency
Question Suggester Proposes meaningful next questions Cognitive overload
Explanation Translator Converts MUS/MCS into contextual explanations User distrust

Critically, the planner itself remains symbolic (Fast Downward-based). The LLM never performs the planning computation. It only mediates.

This separation aligns with a growing “LLM-modulo” philosophy: use language models for linguistic ambiguity, not combinatorial reasoning.


Richer Question Types — Structuring the Conversation

The system supports structured conversational intents:

When the Task Is Unsolvable

Question Type Meaning Underlying Formal Basis
US-WHY Why is it unsolvable? All MUS of enforced goals
US-HOW How can I fix it? All MCS of enforced goals

When a Plan Exists but Goals Are Missing

Question Type Meaning Computed From
S-WHY-NOT Why is goal X not satisfied? Conflicts with satisfied goals
S-WHAT-IF What happens if I enforce X? Feasibility impact
S-CAN Can X be satisfied? Conflict absence
S-HOW How can X be satisfied? Required trade-offs

This formal scaffolding prevents hallucinated explanations. Every answer is anchored in computed MUS/MCS sets.

The LLM’s role is not to invent explanations — it is to translate structural truth into conversational clarity.

That distinction matters.


Evaluation — Does Conversation Improve Planning?

A user study (131 participants, two groups) compared:

  • Template-based explanation interface
  • LLM conversational interface

Participants had 15 minutes to maximize goal utility under hidden constraints.

Key Metrics

Metric Template LLM-Based
Avg Utility Achieved 20.5 20.8
Max Utility Reached (%) 13.8% 15.2%
Avg Questions Asked 22.8 11.4

Observations:

  1. Objective gains were modest — not statistically dominant.
  2. Subjective improvement was significant.
  3. Users reached comparable outcomes with fewer explicit questions.

The conversational interface reduced cognitive friction. Participants relied heavily on suggestion-based “S-CAN” queries — essentially probing the solution boundary efficiently.

The LLM layer did not make the planner smarter.

It made the human more effective.


Why This Architecture Works (and Why It’s Conservative by Design)

There is discipline in this design:

  • LLMs are constrained to translation and summarization.
  • Formal explanation remains symbolic.
  • Context memory is scoped per iteration step to prevent drift.
  • Summarization is controlled and selection-based.

In an era where many systems attempt to replace symbolic planning entirely, this approach accepts a boundary:

LLMs are poor at deep combinatorial reasoning, but excellent at contextual mediation.

The result is hybrid intelligence.

Not hype — architecture.


Business Implications — Where This Actually Matters

For enterprises deploying planning systems (logistics, manufacturing, compliance scheduling, defense coordination), this work signals three practical shifts:

1. Trust Through Iteration, Not Just Accuracy

Even optimal plans face rejection if stakeholders do not understand trade-offs. Conversational explanation accelerates preference convergence.

2. Reduced Training Costs

Template-based systems require users to learn structured question types. Conversational layers lower onboarding friction.

3. Governance & Auditability

Because explanations are grounded in MUS/MCS, responses remain auditable. This is crucial in regulated sectors where AI decisions must be defensible.

The architecture implicitly supports explainability-by-design — a governance advantage.


Limitations — Where the Edges Are

The study was conducted on a moderate-scale planning task with lay participants.

Open questions remain:

  • How does this scale in industrial planning domains?
  • Will domain experts ask more complex questions that challenge translation accuracy?
  • How robust is summarization under high conflict density?

The authors wisely avoid claiming generality. The architecture is promising — not universal.


Conclusion — Plans Don’t Need to Speak. But They Should Be Heard.

This work offers a restrained but powerful thesis:

LLMs should not replace planners.

They should help humans reason with planners.

In enterprise AI, the frontier is not raw optimization performance. It is human–AI alignment under constraints.

The future of planning systems is not silent automation.

It is structured conversation over formal truth.

And that, strategically, is a far more durable direction.


Cognaptus: Automate the Present, Incubate the Future.