When Agents Loop: Geometry, Drift, and the Hidden Physics of LLM Behavior

Opening — Why this matters now

Agentic AI systems are everywhere—self-refining copilots, multi-step reasoning chains, autonomous research bots quietly talking to themselves. Yet beneath the productivity demos lurks an unanswered question: what actually happens when an LLM talks to itself repeatedly? Does meaning stabilize, or does it slowly dissolve into semantic noise?

The paper “Dynamics of Agentic Loops in Large Language Models” offers an unusually rigorous answer. Instead of hand-waving about “drift” or “stability,” it treats agentic loops as discrete dynamical systems and analyzes them geometrically in embedding space. The result is less sci‑fi mysticism, more applied mathematics—and that’s a compliment fileciteturn0file0.

Background — From prompts to trajectories

Most agent frameworks implicitly assume that iterating an LLM is benign: refine, critique, rewrite, repeat. But iteration is a transformation, and transformations have dynamics.

The paper introduces a clean separation:

Artifact space: where text lives and LLMs generate strings.
Semantic embedding space: where those strings are mapped into vectors and measured.

Once outputs are embedded, an agentic loop becomes a trajectory—a path traced through semantic space as the model repeatedly transforms its own outputs. This allows concepts like convergence, dispersion, clusters, and attractors to be defined precisely, rather than metaphorically.

If this sounds like borrowing tools from physics and dynamical systems theory—yes, that’s exactly the point.

Analysis — Fixing the measurement problem first

Before measuring trajectories, the author tackles an inconvenient truth: cosine similarity is biased.

Modern sentence embeddings are anisotropic. Everything looks similar to everything else. Raw cosine similarity systematically overestimates semantic closeness, making it nearly useless for detecting subtle drift.

The paper’s solution is refreshingly practical:

Use human-judged semantic similarity (STS benchmark) as ground truth.
Apply isotonic regression to recalibrate cosine similarity.

This calibrated similarity:

Eliminates systematic bias (mean bias error → 0)
Achieves near-perfect calibration (ECE ≈ 0)
Preserves local stability (~98%) despite being piecewise-constant

In short: the ruler is fixed before measuring the motion. A detail many agent papers conveniently skip fileciteturn0file0.

Findings — Two prompts, two universes

With measurement solved, the paper runs a deceptively simple experiment: two single-prompt agentic loops, 50 iterations each, same model, same starting sentence.

Contractive loop (rewrite, preserve meaning)

Prompt: “Rewrite the sentence to sound slightly more natural while preserving meaning exactly.”

Observed dynamics:

High local similarity (>0.85)
Decreasing dispersion
Clear cluster formation
Eventual convergence to a single semantic attractor

This loop behaves like a contraction mapping. Iteration smooths, compresses, and stabilizes meaning.

Exploratory loop (summarize, then negate)

Prompt: “Summarize the text, then negate its main idea.”

Observed dynamics:

Large stepwise semantic jumps
Low similarity between iterations
No stable clusters
Unbounded drift

This loop never settles. Meaning is repeatedly destroyed and rebuilt. The trajectory wanders freely through semantic space.

Side-by-side summary

Property	Contractive Loop	Exploratory Loop
Mean local similarity	High (>0.85)	Low (<0.5)
Global drift	Bounded	Unbounded
Clusters	Multiple → One	None
Regime	Convergent	Divergent

Same model. Same temperature. Different prompt. Radically different physics.

Implications — Prompting is dynamical control

The uncomfortable takeaway is this: prompt design implicitly sets the dynamical regime of an agent.

Want reliability, refinement, and consistency? You need contractive prompts.
Want novelty, exploration, or creative destruction? You are inducing divergence, whether you realize it or not.

This reframes several practical concerns:

Runaway agents are not mysterious—they are poorly constrained dynamical systems.
Self-refinement works precisely because it creates semantic attractors.
Creative agents need both expansion and contraction phases, or they either stagnate or collapse.

The paper sketches a natural next step: composite loops that alternate exploration and consolidation. In other words, creativity as controlled oscillation, not chaos.

Conclusion — From vibes to vectors

This work does something rare in agent research: it replaces intuition with instrumentation.

By treating agentic loops as trajectories in calibrated semantic space, it shows that stability, drift, and creativity are not mystical properties of LLMs—they are geometric consequences of how we prompt and iterate them.

If you are building agentic systems, the message is clear: stop asking whether an agent is “smart.” Start asking what kind of dynamical system you’ve accidentally created.

Cognaptus: Automate the Present, Incubate the Future.

Opening — Why this matters now#

Background — From prompts to trajectories#

Analysis — Fixing the measurement problem first#

Findings — Two prompts, two universes#

Contractive loop (rewrite, preserve meaning)#

Exploratory loop (summarize, then negate)#

Side-by-side summary#

Implications — Prompting is dynamical control#

Conclusion — From vibes to vectors#