Opening — Why this matters now

Most AI agents today suffer from a strange form of amnesia.

They can reason, plan, call APIs, browse the web, and orchestrate workflows. But once the task is finished, the experience disappears. The next time the same task appears, the agent starts again from scratch — repeating the same mistakes, inefficiencies, and blind guesses.

For enterprise automation, this is not just inconvenient. It is economically absurd. Imagine hiring a human analyst who forgets everything they learned yesterday.

The paper Trajectory‑Informed Memory Generation for Self‑Improving Agent Systems proposes a framework that treats agent executions as a source of institutional knowledge. Instead of storing raw conversation history, it extracts structured lessons from past runs — strategies, recovery patterns, and optimization rules — and injects them back into future reasoning.

In short: agents stop merely acting and begin learning from their own behavior.


Background — The memory problem in LLM agents

Modern LLM agents typically follow a loop similar to the ReAct paradigm:

  1. Reason about the task
  2. Choose an action
  3. Execute via tools or APIs
  4. Observe the result
  5. Repeat until completion

Each run produces a trajectory — a sequence of reasoning steps, actions, and outcomes.

These trajectories are incredibly valuable. They reveal:

  • which strategies worked
  • which mistakes caused failures
  • how the agent recovered from errors
  • where inefficient operations occurred

Yet most systems discard this information after execution.

Existing approaches attempt to fix this problem but fall short.

Approach Strength Limitation
Prompt engineering Improves behavior for common patterns Manual and static
Rule systems Deterministic behavior Cannot adapt to new cases
Vector memory stores Store conversational facts Do not capture execution reasoning
Reinforcement learning Optimizes policies Expensive and opaque

The result: agents become more powerful, but not necessarily wiser.

The paper proposes a different idea — treat trajectories themselves as a learning dataset for the agent.


Analysis — From raw trajectories to usable knowledge

The proposed framework converts execution logs into reusable guidance through a three‑phase pipeline.

Phase 1: Trajectory analysis

The system first analyzes the agent’s reasoning process rather than just the actions it took.

Execution traces are decomposed into cognitive patterns:

  • Analytical reasoning — understanding constraints or data
  • Planning — deciding sequences of actions
  • Validation — checking prerequisites
  • Reflection — reassessing decisions after errors

From these patterns the system identifies causal relationships between decisions and outcomes.

For example:

Event Root cause Lesson
Checkout failed Payment method missing Verify prerequisites before transaction
Checkout succeeded but slow Item removal loop Use bulk cart operations
Error then recovery Agent recognized missing payment Add payment then retry

The framework then converts these findings into three categories of actionable knowledge:

Tip Type Purpose
Strategy tips Capture successful execution patterns
Recovery tips Encode how failures were resolved
Optimization tips Identify inefficient but successful behaviors

This categorization matters because different situations require different guidance.

A strategy tip prevents mistakes. A recovery tip fixes them when they happen. An optimization tip saves time when the task already works.


Phase 2: Memory curation

Simply accumulating tips would quickly create noise.

The framework therefore performs memory management before storing them:

  1. Generalization — remove task‑specific details
  2. Semantic clustering — group similar tips
  3. LLM consolidation — merge duplicates

The result is a curated memory store rather than an ever‑growing log.

Each memory entry contains structured metadata:

Field Meaning
Tip category Strategy / recovery / optimization
Trigger condition When the advice applies
Action steps Concrete implementation instructions
Priority Critical → Low
Provenance Which trajectory generated the tip

This design makes the knowledge traceable and auditable — something most agent systems currently lack.


Phase 3: Runtime retrieval

When the agent receives a new task, relevant tips are retrieved and inserted into the prompt before reasoning begins.

Two retrieval methods are explored.

Retrieval Method Characteristics
Cosine similarity Fast, vector search only
LLM‑guided retrieval Uses reasoning to select tips

The retrieved guidance acts like a short internal playbook for the agent.

Instead of starting from zero, the agent begins with accumulated operational knowledge.

In other words, the system introduces something remarkably human:

experience.


Findings — Performance improvements

The framework was evaluated on the AppWorld benchmark, which tests agents performing real application tasks such as email, calendars, e‑commerce, and file management.

Two metrics are used:

Metric Meaning
Task Goal Completion (TGC) Individual task success rate
Scenario Goal Completion (SGC) All variants of a task scenario succeed

Overall improvement

System TGC SGC
Baseline agent 69.6% 50.0%
Memory‑enhanced agent 73.2% 64.3%

The scenario success improvement (+14.3 percentage points) is particularly notable.

SGC measures consistency across related tasks — an area where brittle agents often fail.

Effect on complex tasks

Difficulty Level Baseline SGC With Memory
Easy 79.0% 89.5%
Medium 56.2% 56.2%
Hard 19.1% 47.6%

The improvement on difficult tasks is dramatic.

Hard scenarios see a 149% relative increase in scenario completion.

This result makes intuitive sense: complex workflows benefit the most from accumulated operational experience.


Implications — Why this matters for real AI systems

Trajectory‑based learning shifts how we should think about AI agents.

Most current architectures focus on reasoning quality. This work highlights the importance of experience accumulation.

For enterprises building automation systems, several implications follow.

1. Agents need institutional memory

Companies accumulate operational knowledge over years. Agents currently do not.

Embedding structured learning systems allows agents to develop organizational memory similar to human teams.

2. Execution logs become strategic assets

Agent trajectories are usually treated as debugging artifacts.

In this framework they become a training signal for continuous improvement.

3. Reliability becomes scalable

Memory systems reduce repeated failures and improve consistency across tasks.

This is crucial for deploying AI agents in production environments where reliability matters more than novelty.

4. Governance becomes possible

Because each tip preserves provenance, organizations can audit:

  • why guidance was generated
  • which execution produced it
  • whether it still improves performance

This is a rare example of explainability in agent learning systems.


Conclusion — The beginning of experienced AI

The most interesting insight from this paper is deceptively simple.

Agents do not necessarily need more intelligence.

They need memory of what already happened.

By converting execution trajectories into structured guidance, the proposed system allows agents to learn from both success and failure — accumulating operational wisdom over time.

If this design becomes standard, future AI systems may behave less like brilliant interns and more like seasoned professionals.

And seasoned professionals, as every manager knows, are mostly just people who remember what went wrong last time.

Cognaptus: Automate the Present, Incubate the Future.