When Memory Lies and Rules Save It: Rethinking LLM Agents in Closed Worlds

Opening — Why this matters now

The industry has spent the last year obsessing over one idea: give LLM agents more memory, and they will become more intelligent.

A comforting theory. Also, as it turns out, partially wrong.

As LLM agents move from chatboxes into embodied environments—robotics, simulations, automation pipelines—the failure mode changes. It’s no longer about hallucinating facts. It’s about doing the wrong thing in the right language.

Or worse: doing nothing at all.

The paper fileciteturn0file0 introduces a subtle but consequential insight: agents don’t fail because they lack knowledge—they fail because they misuse it.

That distinction is where most current architectures quietly break.

Background — Context and prior art

Most agent frameworks today fall into two camps:

Reasoning-centric systems (e.g., ReAct-style agents)
Memory-augmented systems (e.g., Reflexion, retrieval-based agents)

Each solves a different problem:

Approach	Strength	Hidden Weakness
Reasoning-based	Flexible planning	Ignores hard constraints
Memory-based	Learns from experience	Applies knowledge blindly

In open environments (like web browsing), this works reasonably well.

But in closed-world environments—such as ALFWorld or ScienceWorld—actions must satisfy strict constraints:

You must be in the right location
You must hold the correct object
Containers must be open

Failure feedback? Often just: “Nothing happens.”

Which is less feedback, more passive-aggressive silence.

This creates two structural failure modes identified in the paper:

Failure Mode	Description	Consequence
P1: Invalid Actions	Violates hidden preconditions	Silent failure
P2: State Drift	Agent’s belief diverges from reality	Cascading errors

And here’s the uncomfortable part: they reinforce each other.

The more the agent misunderstands the world, the more it generates invalid actions. The more invalid actions it generates, the less feedback it receives. The less feedback it receives, the worse its internal state becomes.

A degenerative loop—beautifully systematic, and deeply inconvenient.

Analysis — What RPMS actually does

The proposed architecture, RPMS (Rule-Augmented Memory Synergy), is not just another hybrid system. It is a conflict management framework.

That wording matters.

Instead of asking “how do we combine rules and memory?”, it asks:

“When should each be trusted?”

1. Belief State: Minimal but Sufficient

Rather than reconstructing the full environment, RPMS tracks a lightweight belief state:

$$ b_t = \langle \ell_t, h_t, \mathcal{C}_t, \Pi_t \rangle $$

Where:

$\ell_t$: location
$h_t$: hand state
$\mathcal{C}_t$: container states
$\Pi_t$: object locations

The design choice is intentional: track only what is needed for action validity.

Not intelligence. Not completeness. Just executability.

A refreshing constraint in an industry addicted to over-modeling.

2. Rule Hierarchy: Making Constraints Explicit

RPMS introduces a three-tier rule system:

Tier	Role	Example
Universal	General behavior	“Check preconditions before acting”
Domain	Task procedures	“Find → take → transform → place”
Environment	Hard constraints	“Heating is atomic; object stays in hand”

The key shift: rules are injected at inference time, not buried inside model weights.

This turns implicit assumptions into explicit constraints.

In other words, the agent stops guessing how the world works—and starts being told.

3. Memory — But With a Filter

Here’s the controversial part: the paper shows that memory can hurt performance.

Yes, hurt.

Memory entries are filtered using a state compatibility function:

$$ \mathcal{M}{filt} = { m \in \mathcal{M}{cand} \mid \text{COMPAT}(\sigma(b_t), \sigma(m)) } $$

A simple example:

If a memory assumes your hand is full
But your hand is empty

That memory is discarded.

No philosophical debate. No “let the model decide.” Just elimination.

Because irrelevant experience is worse than no experience.

4. Rules-First Arbitration

When rules and memory disagree, RPMS does something unfashionable:

It trusts rules first.

The arbitration logic:

Step	Action
Hard Filter	Remove rule-violating memories
Soft Annotation	Flag ambiguous conflicts

This is quietly radical.

Most agent systems treat memory as “wisdom.” RPMS treats it as advice that must pass compliance checks.

Which sounds less like AI—and more like corporate governance.

Findings — Results that contradict intuition

The experimental results are… inconvenient for current trends.

1. Rules outperform memory (by a lot)

Condition	Success Rate
Baseline	35.8%
Memory-only	41.0%
Rules-only	50.7%
Full RPMS	59.7%

Rules contribute +14.9 pp, while memory contributes only +5.2 pp.

Translation: the bottleneck is not intelligence—it’s constraint awareness.

2. Memory can degrade performance

From the paper’s per-task analysis:

Task Type	Memory Effect
Place	Improves (33.3% → 70.8%)
Look	Degrades (55.6% → 11.1%)

That’s not noise. That’s structural.

Unfiltered memory introduces contextually invalid strategies.

Or in plain terms: the agent remembers something useful… at the wrong time.

3. The synergy is real—but conditional

The combined gain (+23.9 pp) exceeds the sum of individual contributions.

But only when:

Memory is filtered
Rules are enforced
Conflicts are resolved

Otherwise, performance drops below rules-only.

So the takeaway is precise:

Memory is not additive. It is conditional.

4. Cross-environment validation holds

On ScienceWorld:

Condition	Score
Baseline	44.9
Rules-only	51.3
Memory-only	46.0
Full RPMS	54.0

The pattern repeats.

Different environment. Different model. Same conclusion.

Which suggests this is not a benchmark trick—it’s an architectural principle.

Implications — What this means for real systems

1. Stop over-investing in memory

Most teams are building:

Larger vector databases
Longer context windows
More elaborate retrieval pipelines

RPMS suggests a different priority:

Fix when memory is used before expanding how much you store.

2. Rules are not obsolete—they’re missing

The industry narrative frames rules as “old AI.”

This paper quietly disagrees.

Rules are not replacing LLMs—they are making them operationally valid.

Without rules, agents remain articulate but ineffective.

3. Agent design is becoming governance design

RPMS resembles a compliance system more than a reasoning engine:

Rules = policy constraints
Memory = historical cases
Arbitration = decision governance

This is not accidental.

As agents move into business workflows, decision correctness matters more than generative flexibility.

4. The real bottleneck: execution, not cognition

The results imply something slightly uncomfortable:

LLMs are already smart enough.

What they lack is alignment with environment constraints at inference time.

Which shifts the engineering problem from:

“How do we make models smarter?”

to:

“How do we make decisions valid?”

A less glamorous question. A more profitable one.

Conclusion — Less memory, more discipline

RPMS doesn’t introduce a new model, a new training paradigm, or a new scaling law.

It introduces something far less exciting—and far more useful:

discipline in how knowledge is used.

The core insight is almost annoyingly simple:

Memory without context is noise
Reasoning without constraints is fiction
Only when both are controlled does performance stabilize

In a field obsessed with adding more—more tokens, more parameters, more data—RPMS is a reminder that sometimes the breakthrough is not expansion.

It’s filtration.

And occasionally, saying no.

Cognaptus: Automate the Present, Incubate the Future.

Opening — Why this matters now#

Background — Context and prior art#

Analysis — What RPMS actually does#

1. Belief State: Minimal but Sufficient#

2. Rule Hierarchy: Making Constraints Explicit#

3. Memory — But With a Filter#

4. Rules-First Arbitration#

Findings — Results that contradict intuition#

1. Rules outperform memory (by a lot)#

2. Memory can degrade performance#

3. The synergy is real—but conditional#

4. Cross-environment validation holds#

Implications — What this means for real systems#

1. Stop over-investing in memory#

2. Rules are not obsolete—they’re missing#

3. Agent design is becoming governance design#

4. The real bottleneck: execution, not cognition#

Conclusion — Less memory, more discipline#