Opening — Why this matters now

LLM-driven discovery systems have crossed an uncomfortable threshold. They no longer fail because models cannot generate ideas, but because they cannot remember the right things. AlphaEvolve, FunSearch, and their successors proved that iterative code evolution works. What they also revealed is a structural bottleneck: context windows are finite, expensive, and poorly used.

The dominant design pattern—storing full historical programs as evolutionary memory—treats context as an archive. DeltaEvolve argues that this is the wrong abstraction. Evolution does not need a museum of past solutions. It needs direction.

Background — Evolution reinterpreted as EM

DeltaEvolve reframes LLM-driven evolution through an Expectation–Maximization (EM) lens. This move is more than theoretical hygiene; it exposes where progress actually comes from.

  • E-step: the language model samples candidate programs conditioned on the current context.
  • M-step: the system updates that context using evaluator feedback.

When model weights are frozen, context becomes the only mutable state. There is no gradient update, no parameter learning. All “learning” happens through how history is distilled and re-presented. Under this framing, existing systems commit a subtle but costly error: they optimize the E-step while treating the M-step as bookkeeping.

Analysis — Why full-code memory fails

AlphaEvolve-style systems populate context with top-performing and diverse full programs. This appears reasonable until scale enters the picture. Programs are long, implementation-heavy, and entangle strategy with scaffolding. As a result, the very signal evolution needs—what changed and why it worked—is diluted by irrelevant detail.

DeltaEvolve demonstrates this empirically. Removing numerical scores from context barely hurts performance. Removing structured selection collapses it. LLMs do not regress on scalar rewards; they learn by pattern reuse. Full code hides patterns.

In optimization terms, existing agents store positions, not updates. They remember where they are, not how they moved.

DeltaEvolve — Semantic delta as momentum

DeltaEvolve replaces static program snapshots with semantic deltas: structured descriptions of the logical modifications between successive programs and their qualitative impact on performance.

A delta answers four questions explicitly:

  • What was the previous strategy?
  • What is the new strategy?
  • Which components changed?
  • Why should this change help?

Accumulated over time, these deltas form a directional signal analogous to momentum in stochastic optimization. Evolution stops wandering and starts compounding.

Implementation — Multi-level memory and progressive disclosure

DeltaEvolve operationalizes semantic delta using a three-level memory hierarchy:

Level Content Role
L1 Delta Summary High-level strategic shift
L2 Delta Plan Details Concrete logic changes + hypotheses
L3 Full Code Executable parent only

Older nodes are compressed into summaries. Recent or influential nodes retain detailed plans. Only the current parent is represented as full code. This is paired with a progressive disclosure sampler that dynamically chooses which abstraction level to expose based on relevance and recency.

The result is not merely fewer tokens, but higher information density per token.

Findings — Quality up, cost down

Across five domains—black-box optimization, geometric packing, symbolic regression, PDE solvers, and efficient convolution—DeltaEvolve consistently outperforms full-code evolutionary baselines.

The pattern is stable:

  • Equal or higher best-achieved scores
  • Roughly 37% lower total token consumption
  • Faster convergence with fewer repeated failures

The improvement is not accidental. By preserving causal structure instead of raw artifacts, DeltaEvolve allows useful ideas to transfer across generations rather than being rediscovered.

Implications — Designing agents that actually learn

DeltaEvolve’s deeper contribution is conceptual. It clarifies that in frozen-weight agentic systems:

  • Context construction is the learning algorithm.
  • Memory should encode causality, not artifacts.
  • Evolution benefits more from momentum than from diversity alone.

For practitioners building autonomous research agents, this has immediate consequences. Token budgets translate directly into cost. Directionless exploration is not innovation; it is inefficiency disguised as scale.

More broadly, DeltaEvolve hints at a future where machine-generated scientific logs—structured, causal, and human-readable—become optimization primitives. That is a rare alignment between interpretability and performance.

Conclusion — Evolution, finally with direction

DeltaEvolve succeeds because it asks a simple question that prior systems ignored: what should an evolving agent remember?

Not everything it tried. Only what moved the needle.

By turning history into momentum, DeltaEvolve transforms context from passive memory into an active optimization force. Once framed this way, the limitations of full-code evolution feel less like engineering constraints and more like conceptual debt.

Cognaptus: Automate the Present, Incubate the Future.