Multi-Agent Systems

When AI Discovers Physics: Inside the Multi-Agent Renaissance of Scientific Machine Learning

Opening — Why this matters now Scientific discovery has always been bottlenecked by one thing: human bandwidth. In scientific machine learning (SciML), where physics meets data-driven modeling, that bottleneck shows up as painstaking trial and error—architectures tuned by hand, loss functions adjusted by intuition, and results validated by weeks of computation. Enter AgenticSciML, a new framework from Brown University that asks a radical question: What if AI could not only run the experiment, but design the method itself? ...

Cities That Think: Reasoning AI for the Urban Century

Opening — Why this matters now By 2050, nearly seven out of ten people will live in cities. Yet most urban planning tools today still operate as statistical mirrors—learning from yesterday’s data to predict tomorrow’s congestion. Predictive models can forecast traffic or emissions, but they don’t reason about why or whether those outcomes should occur. The next leap, as argued by Sijie Yang and colleagues in Reasoning Is All You Need for Urban Planning AI, is not more prediction—but more thinking. ...

The Doctor Is In: How DR. WELL Heals Multi-Agent Coordination with Symbolic Memory

Opening — Why this matters now Large language models are learning to cooperate. Or at least, they’re trying. When multiple LLM-driven agents must coordinate—say, to move objects in a shared environment or plan logistics—they often stumble over timing, misunderstanding, and sheer conversational chaos. Each agent talks too much, knows too little, and acts out of sync. DR. WELL, a new neurosymbolic framework from researchers at CMU and USC, proposes a cure: let the agents think symbolically, negotiate briefly, and remember collectively. ...

Doctor, Interrupted: How Multi-Agent AI Revives the Lost Art of Pre‑Consultation

Opening — Why this matters now The global shortage of physicians is no longer a future concern—it’s a statistical certainty. In countries representing half the world’s population, primary care consultations last five minutes or less. In China, it’s often under 4.3 minutes. A consultation this brief can barely fit a polite greeting, let alone a clinical investigation. Yet every wasted second compounds diagnostic risk, burnout, and cost. Enter pre‑consultation: the increasingly vital buffer that collects patient data before the doctor ever walks in. But even AI‑based pre‑consultation systems—those cheerful symptom checkers and chatbots—remain fundamentally passive. They wait for patients to volunteer information, and when they don’t, the machine simply shrugs in silence. ...

Who Really Runs the Workflow? Ranking Agent Influence in Multi-Agent AI Systems

Opening — Why this matters now Multi-agent systems — the so-called Agentic AI Workflows — are rapidly becoming the skeleton of enterprise-grade automation. They promise autonomy, composability, and scalability. But beneath this elegant choreography lies a governance nightmare: we often have no idea which agent is actually in charge. Imagine a digital factory of LLMs: one drafts code, another critiques it, a third summarizes results, and a fourth audits everything. When something goes wrong — toxic content, hallucinated outputs, or runaway costs — who do you blame? More importantly, which agent do you fix? ...

Agents That Build Agents: The ALITA-G Revolution

From Static Models to Self-Evolving Systems Large Language Models (LLMs) began as static entities — vast but inert collections of parameters. Over the last year, they’ve learned to act: wrapped in agentic shells with tools, memory, and feedback loops. But ALITA-G (Qiu et al., 2025) pushes further, imagining agents that don’t just act — they evolve. The paper proposes a framework for turning a general-purpose agent into a domain expert by automatically generating, abstracting, and reusing tools called Model Context Protocols (MCPs). This marks a shift from “agents that reason” to “agents that grow.” ...

Blueprints of Agency: Compositional Machines and the New Architecture of Intelligence

When the term agentic AI is used today, it often conjures images of individual, autonomous systems making plans, taking actions, and learning from feedback loops. But what if intelligence, like biology, doesn’t scale by perfecting one organism — but by building composable ecosystems of specialized agents that interact, synchronize, and co‑evolve? That’s the thesis behind Agentic Design of Compositional Machines — a sprawling, 75‑page manifesto that reframes AI architecture as a modular society of minds, not a monolithic brain. Drawing inspiration from software engineering, systems biology, and embodied cognition, the paper argues that the next generation of LLM‑based agents will need to evolve toward compositionality — where reasoning, perception, and action emerge not from larger models, but from better‑coordinated parts. ...

Fork, Fuse, and Rule: XAgents’ Multipolar Playbook for Safer Multi‑Agent AI

TL;DR XAgents pairs a multipolar task graph (diverge with SIMO, converge with MISO) with IF‑THEN rule guards to plan uncertain tasks and suppress hallucinations. In benchmarks spanning knowledge and logic QA, it outperforms SPP, AutoAgents, TDAG, and AgentNet while using ~29% fewer tokens and ~45% less memory than AgentNet on a representative task. For operators, the practical win is a recipe to encode SOPs as rules on top of agent teams—without giving up adaptability. ...

Memory That Fights Back: How SEDM Turns Agent Logs into Verified Knowledge

TL;DR Most “agent memory” is a junk drawer: it grows fast, gets noisy, and slows everything down. SEDM (Self‑Evolving Distributed Memory) proposes an auditable, efficiency‑first overhaul. It verifies each candidate memory by replaying the exact run in a Self‑Contained Execution Context (SCEC), assigns an initial utility‑aligned weight, and then self‑schedules what to retrieve next. The result: higher task accuracy with fewer tokens versus strong memory baselines on FEVER and HotpotQA. ...

Mirror, Signal, Maneuver: How 'Self' Labels Nudge LLM Cooperation

When an agent thinks it sees itself in the mirror, it doesn’t necessarily smile—it sometimes clutches its wallet. TL;DR In an iterated public‑goods game (20 rounds, 10 tokens per round, 1.6 multiplier), telling models they’re playing “another AI” versus “themselves” shifts contributions by up to ~4 points in some settings. Direction of the shift depends on the prompt persona: with collective prompts, “self” labels often reduced contributions; with selfish prompts, “self” labels sometimes increased matching/cooperation. Effects persist under rephrased prompts and when reasoning traces aren’t requested, and they appear even in four‑agent self‑play variants. For enterprise multi‑agent AI, identity cues are levers. Manage them like you manage feature flags: test, monitor, and standardize. What the authors tested (and why it’s clever) Game mechanics. Two (and later four) LLM agents repeatedly choose how much to contribute (0–10) to a common pool each round. Pool is multiplied by 1.6 and split evenly; keeping more is privately optimal, but coordinated contribution yields higher joint payoffs. ...