Cover image

Belief Is a Graph: Why LLM Agents Need Structured Minds

Memory is the polite word we use when an LLM agent remembers a document, a user preference, or a previous chat message. It sounds reassuring. It also hides the awkward part: most agent memory is just stored text waiting to be retrieved. That is useful, but it is not the same as belief. ...

March 23, 2026 · 18 min · Zelina
Cover image

Who Sees What, Who Pays the Cost? Teaching Agents to See Through Others’ Eyes

TL;DR for operators The paper’s useful message is not “symbolic planners can teach LLM agents to reason socially.” That would be tidy, flattering, and mostly wrong. The useful message is narrower and more operational: planner-derived thought-action examples can scaffold some agent behaviour, especially local decision discipline, but they do not automatically create robust perspective-taking. In the tested Director–Matcher environment, agents do well when the task is basically “ignore what the other party cannot see.” They struggle when they must imagine what exists in another agent’s private view, or decide whether it is worth asking, moving, opening, or acting under uncertainty.1 ...

August 23, 2025 · 20 min · Zelina
Cover image

Mind the Gap: How AI Papers Misuse Psychology

TL;DR for operators AI teams love borrowing psychology. It gives messy model behaviour a tidy name: “reasoning,” “empathy,” “Theory of Mind,” “bias,” “motivation,” “attention.” The problem is that a borrowed label is not the same as a valid construct. A new paper, The Incomplete Bridge: How AI Research (Mis)Engages with Psychology, studies this borrowing directly by mapping 1,006 LLM-related papers from major AI venues and the 2,544 psychology papers they cite.1 ...

July 31, 2025 · 21 min · Zelina
Cover image

Mind Games for Machines: How Decrypto Reveals the Hidden Gaps in AI Reasoning

TL;DR for operators Meetings are easy to automate until someone has to understand what everyone else thinks everyone else knows. That is the useful discomfort created by Decrypto, a new benchmark for multi-agent reasoning and theory of mind in language models.1 The benchmark is built around a simple word game. Alice and Bob share four secret keywords. Alice receives a three-digit code and gives three public hints. Bob must recover the code. Eve sees the same hints but does not know the secret keywords and tries to intercept. Alice’s job is therefore not “give good clues.” It is “give clues calibrated to Bob’s knowledge while limiting Eve’s inference.” Welcome to enterprise communication, but with fewer calendar invites. ...

June 26, 2025 · 16 min · Zelina