Opening — Why this matters now

Everyone is obsessed with context windows.

200K tokens. 1M tokens. Soon, 10M tokens. The implicit promise is seductive: give the model enough room, and memory becomes a solved problem.

That promise is wrong.

The paper Facts as First-Class Objects: Knowledge Objects for Persistent LLM Memory fileciteturn0file0 doesn’t just challenge this assumption—it dismantles it with uncomfortable precision. The issue is not how much a model can remember in a single session. It’s what survives after that session ends.

And that, as it turns out, is very little.

Background — Context windows were never designed to be memory

The current dominant paradigm—in-context memory—is elegantly simple: stuff everything the model needs into the prompt and let attention do the rest.

It works surprisingly well.

Within a context window, frontier models can retrieve facts with near-perfect accuracy. The paper shows that Claude Sonnet 4.5 achieves 100% exact-match accuracy up to ~7,000 facts, covering nearly its entire 200K-token window.

For a moment, it looks like the industry bet paid off.

But this is a controlled illusion.

The moment you leave the lab and enter production, three structural failures appear:

Failure Mode What Happens Why It Matters
Capacity limit Hard overflow beyond context window System simply stops working
Compaction loss ~60% facts lost after summarization Knowledge becomes irrecoverable
Goal drift ~54% constraints vanish over time System behaves incorrectly—confidently

The phrase the authors use is “context rot”—and it’s more literal than metaphorical.

Analysis — What the paper actually does

1. It proves context memory works… until it doesn’t

The benchmark is almost annoyingly fair.

  • Structured facts
  • Clear queries
  • Exact-match evaluation

Result: no degradation inside the window.

Then comes the cliff.

At ~8,000 facts, the system doesn’t degrade—it fails outright. A hard API boundary replaces any notion of graceful scaling.

2. It quantifies compaction as destruction, not compression

When systems run out of space, they summarize.

That sounds reasonable. It isn’t.

After compressing 2,000 facts by ~36×:

Outcome Rate
Correct retrieval 40%
Lost knowledge 60%
Hallucination 0%

Notice the subtlety: the model doesn’t hallucinate—it admits ignorance. Which is admirable, but operationally useless.

Compression doesn’t degrade knowledge. It deletes it.

3. It exposes the real risk: goal drift

Facts are one thing. Constraints are another.

The paper embeds 20 project rules (e.g., compliance requirements, architecture decisions) into a conversation, then applies repeated summarization.

After three rounds:

Stage Constraints Preserved
No compression 100%
1× compression 91%
2× compression 62%
3× compression 46%

And here’s the dangerous part:

The model continues operating with full confidence.

No warning. No uncertainty. Just quiet deviation from original intent.

If you’re building anything regulated—finance, healthcare, compliance—that’s not a bug. That’s a liability.

4. It introduces Knowledge Objects (KOs)

Instead of treating memory as text, the paper proposes treating facts as first-class objects:

$$ KO = (subject, predicate, object, metadata) $$

Stored externally, retrieved via hash keys, and injected into the model only when needed.

This changes everything.

Property In-Context Memory Knowledge Objects
Retrieval cost O(N) O(1)
Capacity Limited Unlimited
Compaction Lossy None
Persistence Session-bound Persistent

It’s not smarter. It’s just better architecture.

5. It fixes RAG’s hidden weakness

Standard RAG performs well—until it encounters adversarial facts (near-identical sentences with different values).

In those cases:

  • Embedding retrieval: 20% accuracy (essentially random)
  • KO retrieval: 100% accuracy

The paper’s solution—density-adaptive retrieval—detects when embeddings become unreliable and switches to exact matching.

This is less a clever trick and more an admission: similarity is not identity.

Findings — What actually works (and what doesn’t)

Let’s collapse the results into something decision-makers care about:

Scenario In-Context RAG KO
Small knowledge base ✅ 100%
Large knowledge base ❌ overflow
Adversarial similarity N/A ❌ 20% ✅ 100%
After compaction ❌ 40% N/A ✅ 100%
Long-running workflows ❌ drift
Multi-hop reasoning ❌ 31.6% ✅ 78.9%

And then the part nobody likes to talk about: cost.

Facts In-Context / Query KO / Query Ratio
1,000 $0.082 $0.002 36×
7,000 $0.568 $0.002 252×

Scaling context is not just inefficient—it’s economically irrational.

Implications — The uncomfortable conclusions

1. Bigger context windows are a distraction

They solve the least important problem (capacity) and ignore the real ones:

  • Information decay
  • Lifecycle instability
  • Silent behavioral drift

A 10M-token window is still just a larger bucket.

2. Memory is a systems problem, not a model problem

The industry keeps trying to solve memory with better models.

This paper shows the opposite:

Memory fails because of how it is stored, not how it is processed.

That’s an architectural issue. Which means the solution lives outside the model.

3. There is a hidden incentive misalignment

The paper quietly highlights something more interesting than the experiments.

Switching from in-context memory to KO reduces token usage by 97–99%.

If you’re a model provider, that’s not innovation—it’s revenue destruction.

Which explains why the industry keeps selling bigger context windows instead.

4. “Persistent AI agents” are mostly fiction (for now)

If your agent relies on:

  • Prompt history
  • Summarized memory
  • Session carryover

…it is already drifting.

You just haven’t noticed yet.

Conclusion — Memory needs to be engineered, not hoped for

The paper doesn’t argue that LLMs are bad at memory.

Quite the opposite—they are remarkably good within the constraints they were designed for.

But those constraints were never meant to support persistence.

The key insight is almost embarrassingly simple:

Facts should not live inside prose.

They should be discrete, addressable, and persistent.

Everything else is just hoping the model remembers what you forgot to store properly.

And hope, as a system design principle, tends to have a short half-life.

Cognaptus: Automate the Present, Incubate the Future.