When Your Agent Starts Copying Itself: Breaking Conversational Inertia

Opening — Why this matters now

Multi-turn agents are supposed to get better with experience. More context, more feedback, more opportunities to adapt. Yet in practice, the opposite often happens. Agents loop. They fixate. They repeat themselves with growing confidence and shrinking effectiveness.

This paper puts a name—and a mechanism—on that failure mode: conversational inertia. And more importantly, it shows that the problem is not a lack of information, but too much of the wrong kind.

Background — From few-shot strength to agent weakness

Large language models were built to imitate. Few-shot learning works precisely because models detect patterns in prior examples and extend them forward. In static tasks, this is a feature. In interactive environments, it quietly becomes a liability.

The paper shows that in multi-turn settings, agents start treating their own past actions as demonstrations. Over time, the model increasingly copies itself rather than responding to new state information. This is not simple context overload. Attention to user inputs stays roughly stable. What grows instead is self-attention to prior assistant outputs, especially in a diagonal, token-to-token pattern.

In other words: the agent is learning from itself, whether or not it deserves to be learned from.

Analysis — What the paper actually does

The authors make two key moves.

1. Diagnose inertia at the attention level

By visualizing attention matrices across environments (maze navigation, web interaction, games, reasoning tasks), they identify a consistent pattern: as conversations lengthen, models allocate more attention to structurally corresponding positions in previous responses.

This diagonal attention correlates strongly with degraded performance. It reflects imitation bias, not deeper reasoning or state abstraction.

2. Turn context length into a training signal

Here’s the clever part. For the same environment state, actions generated with:

Long context → high inertia, more imitation
Short context → lower inertia, more exploration

That contrast alone is enough to create preference data.

The paper introduces Context Preference Learning (CPL):

Generate two actions for the same state: one from long context, one from short
Treat the short-context action as preferred
Train with Direct Preference Optimization (DPO)
No environment rewards, no human labels, no expert trajectories

Only ~0.4% of parameters are updated via LoRA. The goal is not to teach new skills, but to recalibrate which internal behaviors are preferred.

Inference-time fix — Stop hoarding context

Training helps, but the authors also show a simpler lever: how context is managed at inference.

They compare three strategies:

Method	Idea	Problem
Long Context	Keep everything	Maximum inertia, slow
Window	Keep last W turns	Drops info, no cache reuse
Clip (proposed)	Periodically reset history	Breaks inertia cleanly

Clip Context periodically clears conversation history down to a small recent core, then rebuilds. This does three things simultaneously:

Reduces accumulated imitation bias
Preserves KV-cache efficiency (unlike sliding windows)
Forces periodic re-grounding in current observations

Empirically, clip context consistently outperforms window methods and often matches or beats summarization-based approaches—without hallucination risk.

Findings — What actually improved

Across eight agent benchmarks and a deep research task:

Diagonal attention drops by 7–14%
Task success rates rise 4–8% depending on model
Long-context failures improve the most
General reasoning benchmarks remain unchanged

A particularly telling result: most of the gains attributed to “summarization” in prior work come from context truncation itself, not from the summaries.

In short: forgetting helps. Fancy forgetting is optional.

Implications — What this means for agent builders

Three takeaways stand out.

More memory is not more intelligence Long context amplifies imitation bias unless explicitly controlled.
Exploration–exploitation is an attention problem In agents, the bottleneck is not reasoning depth but behavioral flexibility.
Training-free fixes matter Clip-style context control is cheap, robust, and immediately deployable.

For anyone building autonomous agents—customer support bots, research agents, navigation systems—this reframes a common instinct. When agents stall, the answer is not always to add memory. Sometimes it’s to take it away.

Conclusion — Less remembering, better thinking

Conversational inertia explains why capable agents slowly talk themselves into corners. By isolating its attention-level signature and offering both training-time and inference-time remedies, this paper delivers something rare: a mechanistic insight with practical consequences.

The uncomfortable lesson is also the useful one. Agents do not fail because they forget too much. They fail because they remember themselves too well.

Cognaptus: Automate the Present, Incubate the Future.

Opening — Why this matters now#

Background — From few-shot strength to agent weakness#

Analysis — What the paper actually does#

1. Diagnose inertia at the attention level#

2. Turn context length into a training signal#

Inference-time fix — Stop hoarding context#

Findings — What actually improved#

Implications — What this means for agent builders#

Conclusion — Less remembering, better thinking#