Agents With Memory: Turning Execution Logs into Institutional Knowledge

Opening — Why this matters now

Most AI agents today suffer from a strange form of amnesia.

They can reason, plan, call APIs, browse the web, and orchestrate workflows. But once the task is finished, the experience disappears. The next time the same task appears, the agent starts again from scratch — repeating the same mistakes, inefficiencies, and blind guesses.

For enterprise automation, this is not just inconvenient. It is economically absurd. Imagine hiring a human analyst who forgets everything they learned yesterday.

The paper Trajectory‑Informed Memory Generation for Self‑Improving Agent Systems proposes a framework that treats agent executions as a source of institutional knowledge. Instead of storing raw conversation history, it extracts structured lessons from past runs — strategies, recovery patterns, and optimization rules — and injects them back into future reasoning.

In short: agents stop merely acting and begin learning from their own behavior.

Background — The memory problem in LLM agents

Modern LLM agents typically follow a loop similar to the ReAct paradigm:

Reason about the task
Choose an action
Execute via tools or APIs
Observe the result
Repeat until completion

Each run produces a trajectory — a sequence of reasoning steps, actions, and outcomes.

These trajectories are incredibly valuable. They reveal:

which strategies worked
which mistakes caused failures
how the agent recovered from errors
where inefficient operations occurred

Yet most systems discard this information after execution.

Existing approaches attempt to fix this problem but fall short.

Approach	Strength	Limitation
Prompt engineering	Improves behavior for common patterns	Manual and static
Rule systems	Deterministic behavior	Cannot adapt to new cases
Vector memory stores	Store conversational facts	Do not capture execution reasoning
Reinforcement learning	Optimizes policies	Expensive and opaque

The result: agents become more powerful, but not necessarily wiser.

The paper proposes a different idea — treat trajectories themselves as a learning dataset for the agent.

Analysis — From raw trajectories to usable knowledge

The proposed framework converts execution logs into reusable guidance through a three‑phase pipeline.

Phase 1: Trajectory analysis

The system first analyzes the agent’s reasoning process rather than just the actions it took.

Execution traces are decomposed into cognitive patterns:

Analytical reasoning — understanding constraints or data
Planning — deciding sequences of actions
Validation — checking prerequisites
Reflection — reassessing decisions after errors

From these patterns the system identifies causal relationships between decisions and outcomes.

For example:

Event	Root cause	Lesson
Checkout failed	Payment method missing	Verify prerequisites before transaction
Checkout succeeded but slow	Item removal loop	Use bulk cart operations
Error then recovery	Agent recognized missing payment	Add payment then retry

The framework then converts these findings into three categories of actionable knowledge:

Tip Type	Purpose
Strategy tips	Capture successful execution patterns
Recovery tips	Encode how failures were resolved
Optimization tips	Identify inefficient but successful behaviors

This categorization matters because different situations require different guidance.

A strategy tip prevents mistakes. A recovery tip fixes them when they happen. An optimization tip saves time when the task already works.

Phase 2: Memory curation

Simply accumulating tips would quickly create noise.

The framework therefore performs memory management before storing them:

Generalization — remove task‑specific details
Semantic clustering — group similar tips
LLM consolidation — merge duplicates

The result is a curated memory store rather than an ever‑growing log.

Each memory entry contains structured metadata:

Field	Meaning
Tip category	Strategy / recovery / optimization
Trigger condition	When the advice applies
Action steps	Concrete implementation instructions
Priority	Critical → Low
Provenance	Which trajectory generated the tip

This design makes the knowledge traceable and auditable — something most agent systems currently lack.

Phase 3: Runtime retrieval

When the agent receives a new task, relevant tips are retrieved and inserted into the prompt before reasoning begins.

Two retrieval methods are explored.

Retrieval Method	Characteristics
Cosine similarity	Fast, vector search only
LLM‑guided retrieval	Uses reasoning to select tips

The retrieved guidance acts like a short internal playbook for the agent.

Instead of starting from zero, the agent begins with accumulated operational knowledge.

In other words, the system introduces something remarkably human:

experience.

Findings — Performance improvements

The framework was evaluated on the AppWorld benchmark, which tests agents performing real application tasks such as email, calendars, e‑commerce, and file management.

Two metrics are used:

Metric	Meaning
Task Goal Completion (TGC)	Individual task success rate
Scenario Goal Completion (SGC)	All variants of a task scenario succeed

Overall improvement

System	TGC	SGC
Baseline agent	69.6%	50.0%
Memory‑enhanced agent	73.2%	64.3%

The scenario success improvement (+14.3 percentage points) is particularly notable.

SGC measures consistency across related tasks — an area where brittle agents often fail.

Effect on complex tasks

Difficulty Level	Baseline SGC	With Memory
Easy	79.0%	89.5%
Medium	56.2%	56.2%
Hard	19.1%	47.6%

The improvement on difficult tasks is dramatic.

Hard scenarios see a 149% relative increase in scenario completion.

This result makes intuitive sense: complex workflows benefit the most from accumulated operational experience.

Implications — Why this matters for real AI systems

Trajectory‑based learning shifts how we should think about AI agents.

Most current architectures focus on reasoning quality. This work highlights the importance of experience accumulation.

For enterprises building automation systems, several implications follow.

1. Agents need institutional memory

Companies accumulate operational knowledge over years. Agents currently do not.

Embedding structured learning systems allows agents to develop organizational memory similar to human teams.

2. Execution logs become strategic assets

Agent trajectories are usually treated as debugging artifacts.

In this framework they become a training signal for continuous improvement.

3. Reliability becomes scalable

Memory systems reduce repeated failures and improve consistency across tasks.

This is crucial for deploying AI agents in production environments where reliability matters more than novelty.

4. Governance becomes possible

Because each tip preserves provenance, organizations can audit:

why guidance was generated
which execution produced it
whether it still improves performance

This is a rare example of explainability in agent learning systems.

Conclusion — The beginning of experienced AI

The most interesting insight from this paper is deceptively simple.

Agents do not necessarily need more intelligence.

They need memory of what already happened.

By converting execution trajectories into structured guidance, the proposed system allows agents to learn from both success and failure — accumulating operational wisdom over time.

If this design becomes standard, future AI systems may behave less like brilliant interns and more like seasoned professionals.

And seasoned professionals, as every manager knows, are mostly just people who remember what went wrong last time.

Cognaptus: Automate the Present, Incubate the Future.

Opening — Why this matters now#

Background — The memory problem in LLM agents#

Analysis — From raw trajectories to usable knowledge#

Phase 1: Trajectory analysis#

Phase 2: Memory curation#

Phase 3: Runtime retrieval#

Findings — Performance improvements#

Overall improvement#

Effect on complex tasks#

Implications — Why this matters for real AI systems#

1. Agents need institutional memory#

2. Execution logs become strategic assets#

3. Reliability becomes scalable#

4. Governance becomes possible#

Conclusion — The beginning of experienced AI#