Context Rot & The Memory Illusion: Why Bigger Prompts Won’t Save Your AI

Opening — Why this matters now

Everyone is obsessed with context windows.

200K tokens. 1M tokens. Soon, 10M tokens. The implicit promise is seductive: give the model enough room, and memory becomes a solved problem.

That promise is wrong.

The paper Facts as First-Class Objects: Knowledge Objects for Persistent LLM Memory fileciteturn0file0 doesn’t just challenge this assumption—it dismantles it with uncomfortable precision. The issue is not how much a model can remember in a single session. It’s what survives after that session ends.

And that, as it turns out, is very little.

Background — Context windows were never designed to be memory

The current dominant paradigm—in-context memory—is elegantly simple: stuff everything the model needs into the prompt and let attention do the rest.

It works surprisingly well.

Within a context window, frontier models can retrieve facts with near-perfect accuracy. The paper shows that Claude Sonnet 4.5 achieves 100% exact-match accuracy up to ~7,000 facts, covering nearly its entire 200K-token window.

For a moment, it looks like the industry bet paid off.

But this is a controlled illusion.

The moment you leave the lab and enter production, three structural failures appear:

Failure Mode	What Happens	Why It Matters
Capacity limit	Hard overflow beyond context window	System simply stops working
Compaction loss	~60% facts lost after summarization	Knowledge becomes irrecoverable
Goal drift	~54% constraints vanish over time	System behaves incorrectly—confidently

The phrase the authors use is “context rot”—and it’s more literal than metaphorical.

Analysis — What the paper actually does

1. It proves context memory works… until it doesn’t

The benchmark is almost annoyingly fair.

Structured facts
Clear queries
Exact-match evaluation

Result: no degradation inside the window.

Then comes the cliff.

At ~8,000 facts, the system doesn’t degrade—it fails outright. A hard API boundary replaces any notion of graceful scaling.

2. It quantifies compaction as destruction, not compression

When systems run out of space, they summarize.

That sounds reasonable. It isn’t.

After compressing 2,000 facts by ~36×:

Outcome	Rate
Correct retrieval	40%
Lost knowledge	60%
Hallucination	0%

Notice the subtlety: the model doesn’t hallucinate—it admits ignorance. Which is admirable, but operationally useless.

Compression doesn’t degrade knowledge. It deletes it.

3. It exposes the real risk: goal drift

Facts are one thing. Constraints are another.

The paper embeds 20 project rules (e.g., compliance requirements, architecture decisions) into a conversation, then applies repeated summarization.

After three rounds:

Stage	Constraints Preserved
No compression	100%
1× compression	91%
2× compression	62%
3× compression	46%

And here’s the dangerous part:

The model continues operating with full confidence.

No warning. No uncertainty. Just quiet deviation from original intent.

If you’re building anything regulated—finance, healthcare, compliance—that’s not a bug. That’s a liability.

4. It introduces Knowledge Objects (KOs)

Instead of treating memory as text, the paper proposes treating facts as first-class objects:

$$ KO = (subject, predicate, object, metadata) $$

Stored externally, retrieved via hash keys, and injected into the model only when needed.

This changes everything.

Property	In-Context Memory	Knowledge Objects
Retrieval cost	O(N)	O(1)
Capacity	Limited	Unlimited
Compaction	Lossy	None
Persistence	Session-bound	Persistent

It’s not smarter. It’s just better architecture.

5. It fixes RAG’s hidden weakness

Standard RAG performs well—until it encounters adversarial facts (near-identical sentences with different values).

In those cases:

Embedding retrieval: 20% accuracy (essentially random)
KO retrieval: 100% accuracy

The paper’s solution—density-adaptive retrieval—detects when embeddings become unreliable and switches to exact matching.

This is less a clever trick and more an admission: similarity is not identity.

Findings — What actually works (and what doesn’t)

Let’s collapse the results into something decision-makers care about:

Scenario	In-Context	RAG	KO
Small knowledge base	✅ 100%	✅	✅
Large knowledge base	❌ overflow	✅	✅
Adversarial similarity	N/A	❌ 20%	✅ 100%
After compaction	❌ 40%	N/A	✅ 100%
Long-running workflows	❌ drift	❌	✅
Multi-hop reasoning	❌ 31.6%	—	✅ 78.9%

And then the part nobody likes to talk about: cost.

Facts	In-Context / Query	KO / Query	Ratio
1,000	$0.082	$0.002	36×
7,000	$0.568	$0.002	252×

Scaling context is not just inefficient—it’s economically irrational.

Implications — The uncomfortable conclusions

1. Bigger context windows are a distraction

They solve the least important problem (capacity) and ignore the real ones:

Information decay
Lifecycle instability
Silent behavioral drift

A 10M-token window is still just a larger bucket.

2. Memory is a systems problem, not a model problem

The industry keeps trying to solve memory with better models.

This paper shows the opposite:

Memory fails because of how it is stored, not how it is processed.

That’s an architectural issue. Which means the solution lives outside the model.

3. There is a hidden incentive misalignment

The paper quietly highlights something more interesting than the experiments.

Switching from in-context memory to KO reduces token usage by 97–99%.

If you’re a model provider, that’s not innovation—it’s revenue destruction.

Which explains why the industry keeps selling bigger context windows instead.

4. “Persistent AI agents” are mostly fiction (for now)

If your agent relies on:

Prompt history
Summarized memory
Session carryover

…it is already drifting.

You just haven’t noticed yet.

Conclusion — Memory needs to be engineered, not hoped for

The paper doesn’t argue that LLMs are bad at memory.

Quite the opposite—they are remarkably good within the constraints they were designed for.

But those constraints were never meant to support persistence.

The key insight is almost embarrassingly simple:

Facts should not live inside prose.

They should be discrete, addressable, and persistent.

Everything else is just hoping the model remembers what you forgot to store properly.

And hope, as a system design principle, tends to have a short half-life.

Cognaptus: Automate the Present, Incubate the Future.

Opening — Why this matters now#

Background — Context windows were never designed to be memory#

Analysis — What the paper actually does#

1. It proves context memory works… until it doesn’t#

2. It quantifies compaction as destruction, not compression#

3. It exposes the real risk: goal drift#

4. It introduces Knowledge Objects (KOs)#

5. It fixes RAG’s hidden weakness#

Findings — What actually works (and what doesn’t)#

Implications — The uncomfortable conclusions#

1. Bigger context windows are a distraction#

2. Memory is a systems problem, not a model problem#

3. There is a hidden incentive misalignment#

4. “Persistent AI agents” are mostly fiction (for now)#

Conclusion — Memory needs to be engineered, not hoped for#