Memory Over Models: Letting Agents Grow Up Without Retraining

Opening — Why this matters now

We are reaching the awkward teenage years of AI agents.

LLMs can already do things: book hotels, navigate apps, coordinate workflows. But once deployed, most agents are frozen in time. Improving them usually means retraining or fine-tuning models—slow, expensive, and deeply incompatible with mobile and edge environments.

The paper “Beyond Training: Enabling Self-Evolution of Agents with MOBIMEM” takes a blunt stance: continual agent improvement should not depend on continual model training. Instead, evolution should happen where operating systems have always handled adaptation best—memory.

This is not another “better prompt” paper. It is an architectural argument: if agents are software, then memory—not weights—should carry experience.

Background — The limits of model-centric agents

Most current agent systems are model-centric by design. Personalization, efficiency, and capability growth are all pushed into:

Fine-tuning per user
Reinforcement learning loops
Larger multimodal models

The trade-off is predictable:

Goal	Consequence
Higher accuracy	Larger models, higher latency
Personalization	Per-user retraining cost
Edge deployment	Becomes unrealistic

Prior memory systems (RAG, GraphRAG, MemGPT-style stores) help retrieve information, but they don’t let agents evolve their behavior safely and efficiently after deployment.

MOBIMEM reframes the problem: instead of teaching the model more, teach the agent to remember better.

Analysis — What MOBIMEM actually builds

MOBIMEM is a memory-centric agent system inspired less by ML pipelines and more by operating systems. It introduces three distinct memory primitives, each handling a different dimension of agent evolution.

1. Profile Memory — Personalization without latency debt

The first challenge is learning user preferences without turning retrieval into a GraphRAG latency nightmare.

MOBIMEM introduces DisGraph:

Semantic meaning lives in nodes, not edges
Edges encode only distance, not meaning
Retrieval = embedding search + graph traversal
Zero LLM calls at query time

Result:

System	Retrieval Latency	Alignment
Vanilla RAG	~20 ms	66.4%
GraphRAG	~6.6 s	81.1%
DisGraph (MOBIMEM)	23.8 ms	83.1%

The key insight: relational structure does not require semantic edges if distance already implies relevance.

2. Experience Memory — Capability without retraining

Instead of storing raw execution traces, MOBIMEM stores experience templates:

Invariant control flow
Variable parameters
Multi-level abstraction (high-level plans ↔ low-level UI steps)

When a new task arrives:

Retrieve the closest template
Fill parameters
Execute without re-planning from scratch

This dramatically improves success rates, especially for smaller or weaker models:

Model	Improvement with Experience Memory
UI-TARS-1.5-7B	+50.3%
Gemini 2.5 Flash	+21–22%
Qwen3-VL-30B	+10.5%

Crucially, this scales without retraining and with minimal human effort.

3. Action Memory — Efficiency without reasoning

Humans don’t reason through every familiar action. Agents shouldn’t either.

MOBIMEM introduces procedural action memory:

ActTree: reuse shared prefixes across tasks
ActChain: reuse invariant prefixes and suffixes within templates

Before executing an action, the agent verifies UI state validity. If stale, it falls back safely.

Results:

Up to 77.3% action reuse
9× latency reduction on mobile devices
Memory overhead: ~1.5 MB

This effectively shifts the bottleneck from model inference to lightweight UI execution.

System Design — Agents as an operating system problem

MOBIMEM doesn’t stop at memory structures. It wraps them in OS-inspired services:

Agent Scheduler: fine-grained parallelism across apps and steps
AgentRR: agent-level record & replay (not blind macro replay)
Exception Handler: interruptions become learnable events

In multi-app workflows, fine-grained scheduling achieves up to 1.98× speedup versus serial execution.

The pattern is clear: agents improve not by thinking harder, but by doing less unnecessary thinking.

Implications — Why this matters beyond mobile agents

MOBIMEM quietly challenges several industry assumptions:

Bigger models are not the only path to better agents
Memory architecture is a first-class design decision
Edge AI needs systems thinking, not just better weights

For businesses deploying agents at scale, this has concrete implications:

Lower inference costs
Faster iteration cycles
Safer post-deployment learning
Viable on-device agents

This is especially relevant for enterprise automation, mobile assistants, and any environment where retraining is operationally painful.

Conclusion — Agents should remember, not retrain

MOBIMEM is not flashy. It doesn’t introduce a new model or benchmark-chasing trick. Instead, it borrows wisdom from decades of systems design: state, memory, scheduling, and reuse matter.

As agents move from demos to infrastructure, this shift—from model-centric to memory-centric design—may be what separates scalable systems from brittle toys.

Cognaptus: Automate the Present, Incubate the Future.

Opening — Why this matters now#

Background — The limits of model-centric agents#

Analysis — What MOBIMEM actually builds#

1. Profile Memory — Personalization without latency debt#

2. Experience Memory — Capability without retraining#

3. Action Memory — Efficiency without reasoning#

System Design — Agents as an operating system problem#

Implications — Why this matters beyond mobile agents#

Conclusion — Agents should remember, not retrain#