Memory Systems

EverMemOS: When Memory Stops Being a Junk Drawer

Memory sounds simple until the assistant has to remember two incompatible things at once. A customer loves craft beer. The same customer is temporarily taking antibiotics. A flat memory system retrieves “likes IPA” and recommends a variety pack, because apparently “memory” means grabbing the loudest sticky note from a drawer and pretending it is wisdom. A more useful assistant retrieves the preference, the medical constraint, the timing, and the relation among them. It recommends a mocktail and quietly avoids turning personalization into negligence. ...

Memory Over Models: Letting Agents Grow Up Without Retraining

Repetition is where most automation systems quietly embarrass themselves. Ask an AI agent to book a hotel once, and it may inspect the screen, reason through options, click through menus, and eventually finish the task. Ask it to do something similar tomorrow, and many systems perform the same little theatre again: perceive, reason, click, wait, reason, click, apologize, recover. Very intelligent. Very expensive. Slightly absurd. ...

Memory, Multiplied: Why LLM Agents Need More Than Bigger Brains

Memory, Multiplied: Why LLM Agents Need More Than Bigger Brains Memory is where many AI demos go to die. The demo looks fluent. The agent remembers the last three messages, calls a tool, summarizes a PDF, maybe even smiles politely while destroying your calendar. Then you return tomorrow and ask it to continue a project involving a client, two documents, three images, and a corrected assumption from last week. Suddenly the “agent” becomes a very expensive intern with amnesia. ...

Memory, Bias, and the Mind of Machines: How Agentic LLMs Mislearn

TL;DR for operators Memory is becoming the fashionable upgrade for AI agents: let the system remember past tasks, extract lessons, and improve without retraining the model. Sensible. Also slightly dangerous, in the same way giving a junior analyst a notebook is useful until they start rewriting the notebook after every meeting. The important result is not that memory sometimes contains bad facts. Everyone who has used software, people, or software made by people already knew that. The sharper point is that useful experience can become faulty during the act of consolidation. When an LLM agent compresses raw trajectories into reusable textual lessons, it may strip away conditions, merge unlike cases, or turn a narrow success into a general rule. The memory then looks cleaner while becoming less true. Very enterprise. ...

The Memory Illusion: Why AI Still Forgets Who It Is

A customer support bot does not need a soul. Pleasantly, most airlines have not yet advertised one. But it does need to remember what role it is playing. If it gives policy advice, that advice must remain anchored to the policy. If it apologises for an error, the correction should bind future answers. If the company has told users the assistant is a support agent, the assistant cannot conveniently become a speculative travel blogger, a therapist, a lawyer, or a magic refund machine, depending on which prompt arrives next. ...

Deep Thinking, Dynamic Acting: How DeepAgent Redefines General Reasoning

Tools are where agent demos go to die. The pitch is usually elegant. Give the model a goal, attach a few APIs, let it reason, and watch the automation glide across systems like a tiny consultant with no calendar conflicts. Then the real world appears: too many tools, unclear documentation, stale context, partial failures, long interaction histories, and the occasional API response that seems to have been designed by someone settling a personal score. ...

Back to School for AGI: Memory, Skills, and Self‑Starter Instincts

TL;DR for operators The paper is not really about whether a model can answer exam questions. Given the right context, the frontier models do very well. The hard part is whether an agent can notice what must be preserved, store it in a useful form, retrieve it at the right time, and act without being explicitly prodded. That is the difference between an assistant that sounds competent and an assistant that can actually carry operational state across days, weeks, and dependent workflows. ...

Love in the Time of Context: Why LLMs Still Don't Get You

TL;DR for operators Personalization does not fail because the model forgot your birthday. That would be almost charming. It fails because the system remembers too much in the wrong shape. The Cupid benchmark tests whether LLMs can infer a user’s context-dependent preference from prior multi-turn interactions and apply it to a new request.1 The setup is deliberately business-relevant: users do not announce a clean preference profile; they reveal expectations through feedback, correction, and mild conversational friction. Very realistic. Nobody fills out a YAML file called my_deeply_contextual_preferences.yml, at least not outside certain Slack channels. ...

Layers of Thought: How Hierarchical Memory Supercharges LLM Agent Reasoning

TL;DR for operators An enterprise agent does not fail only because it forgets. Often, it fails because it remembers like a hoarder with a search bar. The H-MEM paper proposes a hierarchical memory system for LLM agents: Domain, Category, Memory Trace, and Episode layers, connected by positional child indices so retrieval can move from broad meaning to specific memory instead of scanning a flat pile of stored vectors.1 That sounds like software housekeeping. It is actually the main point. ...