Opening — Why this matters now
Everyone is obsessed with context windows.
200K tokens. 1M tokens. Soon, 10M tokens. The implicit promise is seductive: give the model enough room, and memory becomes a solved problem.
That promise is wrong.
The paper Facts as First-Class Objects: Knowledge Objects for Persistent LLM Memory fileciteturn0file0 doesn’t just challenge this assumption—it dismantles it with uncomfortable precision. The issue is not how much a model can remember in a single session. It’s what survives after that session ends.
And that, as it turns out, is very little.
Background — Context windows were never designed to be memory
The current dominant paradigm—in-context memory—is elegantly simple: stuff everything the model needs into the prompt and let attention do the rest.
It works surprisingly well.
Within a context window, frontier models can retrieve facts with near-perfect accuracy. The paper shows that Claude Sonnet 4.5 achieves 100% exact-match accuracy up to ~7,000 facts, covering nearly its entire 200K-token window.
For a moment, it looks like the industry bet paid off.
But this is a controlled illusion.
The moment you leave the lab and enter production, three structural failures appear:
| Failure Mode | What Happens | Why It Matters |
|---|---|---|
| Capacity limit | Hard overflow beyond context window | System simply stops working |
| Compaction loss | ~60% facts lost after summarization | Knowledge becomes irrecoverable |
| Goal drift | ~54% constraints vanish over time | System behaves incorrectly—confidently |
The phrase the authors use is “context rot”—and it’s more literal than metaphorical.
Analysis — What the paper actually does
1. It proves context memory works… until it doesn’t
The benchmark is almost annoyingly fair.
- Structured facts
- Clear queries
- Exact-match evaluation
Result: no degradation inside the window.
Then comes the cliff.
At ~8,000 facts, the system doesn’t degrade—it fails outright. A hard API boundary replaces any notion of graceful scaling.
2. It quantifies compaction as destruction, not compression
When systems run out of space, they summarize.
That sounds reasonable. It isn’t.
After compressing 2,000 facts by ~36×:
| Outcome | Rate |
|---|---|
| Correct retrieval | 40% |
| Lost knowledge | 60% |
| Hallucination | 0% |
Notice the subtlety: the model doesn’t hallucinate—it admits ignorance. Which is admirable, but operationally useless.
Compression doesn’t degrade knowledge. It deletes it.
3. It exposes the real risk: goal drift
Facts are one thing. Constraints are another.
The paper embeds 20 project rules (e.g., compliance requirements, architecture decisions) into a conversation, then applies repeated summarization.
After three rounds:
| Stage | Constraints Preserved |
|---|---|
| No compression | 100% |
| 1× compression | 91% |
| 2× compression | 62% |
| 3× compression | 46% |
And here’s the dangerous part:
The model continues operating with full confidence.
No warning. No uncertainty. Just quiet deviation from original intent.
If you’re building anything regulated—finance, healthcare, compliance—that’s not a bug. That’s a liability.
4. It introduces Knowledge Objects (KOs)
Instead of treating memory as text, the paper proposes treating facts as first-class objects:
$$ KO = (subject, predicate, object, metadata) $$
Stored externally, retrieved via hash keys, and injected into the model only when needed.
This changes everything.
| Property | In-Context Memory | Knowledge Objects |
|---|---|---|
| Retrieval cost | O(N) | O(1) |
| Capacity | Limited | Unlimited |
| Compaction | Lossy | None |
| Persistence | Session-bound | Persistent |
It’s not smarter. It’s just better architecture.
5. It fixes RAG’s hidden weakness
Standard RAG performs well—until it encounters adversarial facts (near-identical sentences with different values).
In those cases:
- Embedding retrieval: 20% accuracy (essentially random)
- KO retrieval: 100% accuracy
The paper’s solution—density-adaptive retrieval—detects when embeddings become unreliable and switches to exact matching.
This is less a clever trick and more an admission: similarity is not identity.
Findings — What actually works (and what doesn’t)
Let’s collapse the results into something decision-makers care about:
| Scenario | In-Context | RAG | KO |
|---|---|---|---|
| Small knowledge base | ✅ 100% | ✅ | ✅ |
| Large knowledge base | ❌ overflow | ✅ | ✅ |
| Adversarial similarity | N/A | ❌ 20% | ✅ 100% |
| After compaction | ❌ 40% | N/A | ✅ 100% |
| Long-running workflows | ❌ drift | ❌ | ✅ |
| Multi-hop reasoning | ❌ 31.6% | — | ✅ 78.9% |
And then the part nobody likes to talk about: cost.
| Facts | In-Context / Query | KO / Query | Ratio |
|---|---|---|---|
| 1,000 | $0.082 | $0.002 | 36× |
| 7,000 | $0.568 | $0.002 | 252× |
Scaling context is not just inefficient—it’s economically irrational.
Implications — The uncomfortable conclusions
1. Bigger context windows are a distraction
They solve the least important problem (capacity) and ignore the real ones:
- Information decay
- Lifecycle instability
- Silent behavioral drift
A 10M-token window is still just a larger bucket.
2. Memory is a systems problem, not a model problem
The industry keeps trying to solve memory with better models.
This paper shows the opposite:
Memory fails because of how it is stored, not how it is processed.
That’s an architectural issue. Which means the solution lives outside the model.
3. There is a hidden incentive misalignment
The paper quietly highlights something more interesting than the experiments.
Switching from in-context memory to KO reduces token usage by 97–99%.
If you’re a model provider, that’s not innovation—it’s revenue destruction.
Which explains why the industry keeps selling bigger context windows instead.
4. “Persistent AI agents” are mostly fiction (for now)
If your agent relies on:
- Prompt history
- Summarized memory
- Session carryover
…it is already drifting.
You just haven’t noticed yet.
Conclusion — Memory needs to be engineered, not hoped for
The paper doesn’t argue that LLMs are bad at memory.
Quite the opposite—they are remarkably good within the constraints they were designed for.
But those constraints were never meant to support persistence.
The key insight is almost embarrassingly simple:
Facts should not live inside prose.
They should be discrete, addressable, and persistent.
Everything else is just hoping the model remembers what you forgot to store properly.
And hope, as a system design principle, tends to have a short half-life.
Cognaptus: Automate the Present, Incubate the Future.