Opening — Why this matters now
Multi-turn agents are supposed to get better with experience. More context, more feedback, more opportunities to adapt. Yet in practice, the opposite often happens. Agents loop. They fixate. They repeat themselves with growing confidence and shrinking effectiveness.
This paper puts a name—and a mechanism—on that failure mode: conversational inertia. And more importantly, it shows that the problem is not a lack of information, but too much of the wrong kind.
Background — From few-shot strength to agent weakness
Large language models were built to imitate. Few-shot learning works precisely because models detect patterns in prior examples and extend them forward. In static tasks, this is a feature. In interactive environments, it quietly becomes a liability.
The paper shows that in multi-turn settings, agents start treating their own past actions as demonstrations. Over time, the model increasingly copies itself rather than responding to new state information. This is not simple context overload. Attention to user inputs stays roughly stable. What grows instead is self-attention to prior assistant outputs, especially in a diagonal, token-to-token pattern.
In other words: the agent is learning from itself, whether or not it deserves to be learned from.
Analysis — What the paper actually does
The authors make two key moves.
1. Diagnose inertia at the attention level
By visualizing attention matrices across environments (maze navigation, web interaction, games, reasoning tasks), they identify a consistent pattern: as conversations lengthen, models allocate more attention to structurally corresponding positions in previous responses.
This diagonal attention correlates strongly with degraded performance. It reflects imitation bias, not deeper reasoning or state abstraction.
2. Turn context length into a training signal
Here’s the clever part. For the same environment state, actions generated with:
- Long context → high inertia, more imitation
- Short context → lower inertia, more exploration
That contrast alone is enough to create preference data.
The paper introduces Context Preference Learning (CPL):
- Generate two actions for the same state: one from long context, one from short
- Treat the short-context action as preferred
- Train with Direct Preference Optimization (DPO)
- No environment rewards, no human labels, no expert trajectories
Only ~0.4% of parameters are updated via LoRA. The goal is not to teach new skills, but to recalibrate which internal behaviors are preferred.
Inference-time fix — Stop hoarding context
Training helps, but the authors also show a simpler lever: how context is managed at inference.
They compare three strategies:
| Method | Idea | Problem |
|---|---|---|
| Long Context | Keep everything | Maximum inertia, slow |
| Window | Keep last W turns | Drops info, no cache reuse |
| Clip (proposed) | Periodically reset history | Breaks inertia cleanly |
Clip Context periodically clears conversation history down to a small recent core, then rebuilds. This does three things simultaneously:
- Reduces accumulated imitation bias
- Preserves KV-cache efficiency (unlike sliding windows)
- Forces periodic re-grounding in current observations
Empirically, clip context consistently outperforms window methods and often matches or beats summarization-based approaches—without hallucination risk.
Findings — What actually improved
Across eight agent benchmarks and a deep research task:
- Diagonal attention drops by 7–14%
- Task success rates rise 4–8% depending on model
- Long-context failures improve the most
- General reasoning benchmarks remain unchanged
A particularly telling result: most of the gains attributed to “summarization” in prior work come from context truncation itself, not from the summaries.
In short: forgetting helps. Fancy forgetting is optional.
Implications — What this means for agent builders
Three takeaways stand out.
-
More memory is not more intelligence Long context amplifies imitation bias unless explicitly controlled.
-
Exploration–exploitation is an attention problem In agents, the bottleneck is not reasoning depth but behavioral flexibility.
-
Training-free fixes matter Clip-style context control is cheap, robust, and immediately deployable.
For anyone building autonomous agents—customer support bots, research agents, navigation systems—this reframes a common instinct. When agents stall, the answer is not always to add memory. Sometimes it’s to take it away.
Conclusion — Less remembering, better thinking
Conversational inertia explains why capable agents slowly talk themselves into corners. By isolating its attention-level signature and offering both training-time and inference-time remedies, this paper delivers something rare: a mechanistic insight with practical consequences.
The uncomfortable lesson is also the useful one. Agents do not fail because they forget too much. They fail because they remember themselves too well.
Cognaptus: Automate the Present, Incubate the Future.