Opening — Why This Matters Now

Large language models have been accused of many things: hallucinating case law, inventing citations, occasionally sounding overconfident in PowerPoint meetings. But here’s a more constructive role: quietly removing irrelevant seafood from your pasta recipe before your planner explodes combinatorially.

In classical AI planning, grounding—the process of instantiating first-order action schemas into propositional form—is often the real villain. The number of grounded actions grows exponentially with object count and parameter arity. Before search even begins, the system may already be suffocating under combinatorics.

The paper “Semantic Partial Grounding via LLMs” proposes a deceptively simple idea: use an LLM to semantically prune irrelevant objects, predicates, and actions before grounding. No modification of the planner. No retraining per domain. Just interpret the PDDL description like a reasonably literate human would.

The result? Orders-of-magnitude reductions in grounded operators—sometimes without harming solution quality.

Let’s unpack what that means for planning systems—and for anyone building agentic pipelines at scale.


Background — The Grounding Bottleneck

A classical STRIPS planning task is defined as:

$$ \Pi = \langle P, O, A, s_0, G \rangle $$

Where:

  • $P$ = predicates
  • $O$ = objects
  • $A$ = action schemas
  • $s_0$ = initial state
  • $G$ = goal condition

Most planners translate this first-order description into a fully propositional representation prior to search. This grounding step creates every possible grounded action instance by substituting parameters with objects.

If you have:

  • 50 objects
  • An action schema with 3 parameters

You’re looking at $50^3 = 125,000$ potential instantiations—for just one action.

Multiply across domains, predicates, fuel levels, resource types, and suddenly the grounding stage dominates total runtime.

Prior Solutions

Previous partial grounding approaches attempted to learn which operators or objects are likely to matter:

Approach Strategy Limitation
Gnad et al. (2019) ML over relational features Domain-specific training
Areces et al. (2023) Small language models + RPG signals Requires planner modifications
PLOI (Silver et al., 2021) GNN predicts important objects Limited predicate arity

All attempt to reduce combinatorial explosion. None leverage the semantic richness already embedded in PDDL descriptions.

That is where SPG-LLM enters.


The Method — Semantic Partial Grounding (SPG-LLM)

Instead of predicting which grounded operators to instantiate, SPG-LLM operates before grounding, at the task level.

It feeds the domain and problem PDDL files into an LLM using a structured prompt. The LLM heuristically removes:

  • Irrelevant objects
  • Unnecessary predicates
  • Superfluous action schemas

The goal condition is immutable. Everything else is negotiable.

Validation Pipeline

To mitigate hallucination and maintain structural integrity, the authors introduce three validation layers:

  1. Syntactic validation — PDDL compliance

  2. Computational validation — Can a plan be generated?

  3. Semantic validation — Ensures:

    • $P’ \subseteq P$
    • $A’ \subseteq A$
    • $O’ \subseteq O$
    • $G’ \equiv G$

It’s not fully sound. It’s not guaranteed complete. But it is pragmatic.

And pragmatism, in planning, is underrated.


Findings — Fewer Actions, Faster Grounding

Across seven planning domains (Agricola, Blocksworld, Depots, Hiking, Satellite, TPP, Zenotravel), SPG-LLM was evaluated against:

  • Full Grounding (FG)
  • Planning with Learned Object Importance (PLOI)

1️⃣ Grounded Actions

SPG-LLM reduced grounded actions dramatically.

Domain FG (avg actions) SPG (avg actions) Reduction
TPP 385k 17k ~95%
Zenotravel 243k 48k ~80%
Satellite 327k 91k ~72%

In several cases, reductions reached two orders of magnitude.

2️⃣ Grounding Time

Domain FG (sec) SPG (sec)
TPP 42.0 1.9
Zenotravel 19.1 3.1
Depots 2.2 0.8

SPG-LLM was consistently the fastest grounding method among commonly solved tasks.

3️⃣ Plan Quality Trade-offs

Results were domain-dependent:

  • Comparable plan cost in Agricola, Satellite
  • Better cost in Blocksworld and Depots
  • Slight cost increase in Zenotravel (457 → 493 average), but 10× faster solve time

Interestingly, SPG-LLM enabled solving tasks that FG could not solve within resource limits.

Coverage dropped slightly (139/175 vs 161/175 for FG), reflecting incompleteness risk.

This is the classic engineering trade-off:

Slight reduction in coverage in exchange for massive computational savings.

For large-scale systems, that is often acceptable.


A Concrete Example — Zenotravel Simplified

SPG-LLM removed:

  • Redundant zoom action (fuel-expensive alternative to fly)
  • Unnecessary passengers not referenced in goal
  • Aircraft count reduced from 24 → 11
  • Fuel levels reduced from 7 → 2

The semantic intuition mirrors human reasoning:

“If the goal doesn’t involve transporting person 33, why instantiate actions involving person 33?”

This is not statistical learning. This is semantic pruning.


Strategic Implications — Beyond Planning

This paper hints at something larger.

1️⃣ LLMs as Structural Simplifiers

We often use LLMs for:

  • Content generation
  • Code assistance
  • Retrieval

SPG-LLM demonstrates a new pattern:

LLMs as pre-processing agents that reduce computational search space before symbolic reasoning begins.

This pattern generalizes.

Domain Potential Application
Compliance workflows Remove irrelevant rule branches
Supply chain optimization Prune unused SKU categories
Automated auditing Eliminate non-impactful ledger dimensions
Multi-agent simulations Restrict irrelevant state dimensions

2️⃣ Hybrid Intelligence Is Practical

The planner remains symbolic. The LLM acts as semantic compressor.

No retraining of planning engine. No architectural overhaul.

Just smarter preprocessing.

3️⃣ Risk Profile Considerations

For enterprise adoption:

  • Semantic validation must be strengthened
  • Soundness guarantees desirable for regulated domains
  • Explanation generation (future work) will be critical

In finance or healthcare, pruning incorrectly could have regulatory consequences.

The validation framework is promising but not yet industrial-grade.


Conclusion — A Planner That Knows When to Ignore Seafood

SPG-LLM does not replace classical planning. It does something more subtle.

It acknowledges that many planning tasks contain semantic structure that humans understand instantly—but planners traditionally ignore.

By allowing LLMs to act as semantic gatekeepers before grounding, we unlock:

  • Faster grounding
  • Smaller search spaces
  • Occasionally better plans
  • New hybrid AI design patterns

The broader message is clear:

The future of agent systems is not purely neural nor purely symbolic. It is architected collaboration between the two.

And sometimes, that collaboration begins by removing the seafood from your pasta domain.

Cognaptus: Automate the Present, Incubate the Future.