Llm-Agents

Tools of Habit: Why LLM Agents Benefit from a Little Inertia

Tools are where many agent demos quietly become invoices. A multi-step LLM agent may look intelligent because it reasons, acts, observes, and repeats. Under the hood, though, it often pays the model to decide every small next move: search here, load that node, look around, check valid actions, fill this argument, try again. Some of those decisions need judgement. Others are basically muscle memory wearing a lab coat. ...

Memory, Bias, and the Mind of Machines: How Agentic LLMs Mislearn

TL;DR for operators Memory is becoming the fashionable upgrade for AI agents: let the system remember past tasks, extract lessons, and improve without retraining the model. Sensible. Also slightly dangerous, in the same way giving a junior analyst a notebook is useful until they start rewriting the notebook after every meeting. The important result is not that memory sometimes contains bad facts. Everyone who has used software, people, or software made by people already knew that. The sharper point is that useful experience can become faulty during the act of consolidation. When an LLM agent compresses raw trajectories into reusable textual lessons, it may strip away conditions, merge unlike cases, or turn a narrow success into a general rule. The memory then looks cleaner while becoming less true. Very enterprise. ...

Parallel Worlds of Moderation: How LLM Simulations Are Stress-Testing Online Civility

TL;DR for operators Moderation is usually measured after the mess has already happened. COSMOS changes the sequence: it lets researchers run a synthetic online conversation twice, once without moderation and once with a selected intervention, while keeping the simulated world otherwise constant.1 That is the useful idea. Not “LLMs can pretend to be angry internet users,” though they can, which is an achievement of sorts. The useful idea is controlled comparison. ...

Parallel Worlds of Moderation: Simulating Online Civility with LLMs

Moderation teams live inside an annoying counterfactual. A user posts something toxic. The platform sends a warning, hides the post, suspends the account, or does nothing. A week later, the team can measure what happened. What it cannot observe is the parallel platform where the same user, same thread, same sequence of replies, and same ambient mood unfolded without that intervention. ...

Dirty Data, Clean Machines: How LLM Agents Rewire Predictive Maintenance

Workshop logs are not glamorous. They are where predictive-maintenance dreams go to meet misspelled component names, missing codes, wrong vehicle identifiers, and dates that imply a truck was both under repair and happily accumulating kilometres. Industrial AI, as ever, is less a matter of elegant algorithms than of persuading messy operational records to stop lying. ...

Thinking Fast and Flowing Slow: Real-Time Reasoning for Autonomous Agents

Delay is not a footnote in automation. It is the product. A customer support agent that takes thirty seconds to decide whether to escalate has already shaped the customer’s mood. A warehouse robot that produces the correct plan after the pallet has moved has produced something closer to poetry than control. A trading assistant that generates a gorgeous hedge after the market has repriced is not sophisticated. It is late, which is the expensive version of wrong. ...

The Rational Illusion: How LLMs Outplayed Humans at Cooperation

A negotiation bot walks into a pricing dispute. That is not the start of a joke. It is the start of a procurement problem, a marketplace design problem, a customer-service escalation problem, and, sooner than executives would like to admit, a governance problem. Once AI systems begin making choices on behalf of organisations, their behaviour in social settings matters. Not just whether they answer correctly. Not just whether they sound polite. Whether they cooperate, defect, compromise, optimise, over-trust, or quietly behave like a very caffeinated economist. ...

Recursive Minds: How ReCAP Turns LLMs into Self-Correcting Planners

A stuck workflow rarely looks intelligent. It looks like a support agent asking for the same invoice twice, a coding agent editing the wrong file for the third time, or an operations bot patiently repeating an invalid action because, apparently, persistence is cheaper than understanding. This is the unglamorous failure mode of many LLM agents. They do not collapse because they cannot produce a plan. They collapse because the plan becomes stale, buried, or locally contradicted by new observations. The agent remembers the latest step and forgets the job. ...

When Agents Learn to Test Themselves: TDFlow and the Future of Software Engineering

A bug report is not a specification A bug report says something is wrong. A test says exactly how wrong must fail. That difference is the centre of TDFlow, a test-driven agentic workflow for repository-scale software repair.1 The paper’s central move is not to make the coding agent more charismatic, more autonomous, or more burdened with inspirational tool access. Mercifully. It does almost the opposite: it narrows the agent’s world until the task becomes executable. ...

Beyond Utility: When LLM Agents Start Dreaming Their Own Tasks

A task list is usually where enterprise automation becomes reassuringly boring. Someone defines the work. The system executes it. A dashboard turns green, or, in more honest organisations, amber with an explanation. The point is not mystery. The point is control. The paper behind this article, LLM Agents Beyond Utility: An Open-Ended Perspective, asks what happens when that tidy arrangement is disturbed: what if the agent does not merely complete tasks, but proposes them? What if it can remember what it has done, inspect its environment, write notes to itself, and continue across runs?1 ...