Multi-Agent Systems

It Takes a Village (of Models): Why Multi-Agent Intelligence Won't Emerge by Accident

Agents are easy to multiply. That is the attractive part. Give one model a browser. Give another a code editor. Add a planner, a critic, a memory layer, a few tools, a dashboard, and suddenly the product demo looks like a small digital office. Everyone has a job title. Everyone talks. Nobody asks whether the “team” actually knows how to be a team. ...

Shift Happens: Detecting Behavioral Drift in Multi‑Agent Systems

Updates are boring until they are not. A retrieval index changes. A tool permission is adjusted. A base model is silently upgraded. A memory module starts carrying yesterday’s weird interaction into today’s customer support workflow. Nobody sees smoke. The dashboard still says “healthy.” The agent still answers. Then, three weeks later, someone notices that one group of agents has become strangely aggressive, risk-averse, evasive, or just less aligned with the behavior the product team thought it had shipped. ...

Heuristics, Meet Your Agents: How Role-Based LLMs Rewire Optimization

Trucks do not care whether your routing algorithm is elegant. They care whether the vehicle arrives, whether the route violates capacity, whether the dispatch plan survives a late order, and whether the whole thing can be recomputed before someone in operations starts calling the system “that AI toy.” Optimization has always lived in this unglamorous place: close enough to mathematics to look pure, close enough to reality to be messy. ...

When Agents Treat Agents as Tools: What Tool-RoCo Tells Us About LLM Autonomy

Dispatch is where autonomy usually goes to die. A warehouse manager may have ten workers, three forklifts, two packing stations, and one increasingly dramatic dashboard. The hard part is not merely deciding what each person should do. The hard part is knowing when to call someone in, when to release them, and when extra “help” is just a polite name for congestion. ...

Debate Club for Robots: How Multi-Agent Arguing Makes Embodied AI Safer

The robot should not need a philosophy seminar before using a microwave Microwaves are excellent devices for exposing weak safety logic. A normal household assistant can be asked to warm food, boil water, clean a counter, water a plant, or move objects around a kitchen. Most of these tasks are harmless. Some are not. “Put a book into the microwave and turn it on” is not a creative lifestyle experiment. It is a fire hazard with better lighting. ...

Agents Behaving Badly: Why 'Agentic AI' Needs Adult Supervision

A travel agent that books a bad flight is annoying. A travel agent that books the wrong flight, triggers a hotel agent to change the reservation, alerts a finance agent to approve reimbursement, and then lets a calendar agent reschedule meetings around the mistake is no longer annoying. It is an organizational incident with a charming user interface. ...

Hierarchy, Not Hype: Why Domain Logic Beats Agent Chaos

Workflow is where agent demos go to die. A user asks for something that sounds simple: “Assess flood damage in this coastal district after the typhoon.” The agent smiles, metaphorically, and begins its little ritual. It searches, summarizes, calls a tool, thinks again, calls another tool, corrects itself, forgets one preprocessing step, invents a plausible shortcut, then produces a confident final answer that looks fine until someone who actually understands geospatial analysis asks an inconvenient question: where did the corrected satellite imagery come from? ...

Hex Marks the Spot: Terra Nova and the New Frontier of Agent Intelligence

A strategy game is a cruelly efficient way to embarrass an intelligent system. Not because games are magic. Not because hexagonal maps secretly contain the meaning of cognition. They do not, despite what several overexcited benchmark papers might imply after a strong coffee. Games are useful because they compress decision pressure. They make planning visible. They force trade-offs. They punish agents that confuse local competence with strategic understanding. ...

Reasoning on Mars: How Pipeline-Parallel RL Rewires Multi‑Agent Intelligence

Review is cheap until it has to be correct. That is the uncomfortable lesson behind many agentic AI demos. A system writes an answer. A second model checks it. A third model fixes it. The workflow looks reassuringly managerial, like a tiny consulting firm trapped inside a GPU cluster. But the appearance of oversight is not the same thing as oversight. A weak reviewer can punish a good answer. A weak fixer can damage a nearly correct answer. And if the whole chain receives one final reward, reinforcement learning may end up congratulating the wrong participant. Very corporate, really. ...

Talk Less, Coordinate More: MARL Meets the Real World

A warehouse robot fleet does not fail because one robot forgot how to move. It fails because three robots each saw a slightly different world, one message arrived late, another was dropped, and the coordination policy confidently optimised against yesterday’s reality. Very modern. Very autonomous. Very expensive. That is the uncomfortable premise behind Robust and Efficient Communication in Multi-Agent Reinforcement Learning, a survey of how multi-agent reinforcement learning, or MARL, behaves when the communication layer is no longer treated as magic plumbing.1 The paper is not presenting a new benchmark champion. Its value is quieter and more useful: it organises a scattered body of work around the communication failures that actually matter in deployed multi-agent systems. ...