Multi-Agent Systems

Playing with Strangers: A New Benchmark for Ad-Hoc Human-AI Teamwork

TL;DR for operators Teamwork is the awkward part of agentic AI. It is easy to show a model completing a task when the environment is clean, the instructions are explicit, and the other “teammates” behave exactly as expected. Real deployments are less polite. Humans omit context, follow local conventions, adapt unevenly, and occasionally do something that looks wrong only because the system has misunderstood the room. ...

The Joy of Many Minds: How JoyAgents-R1 Unleashes the Power of Multi-LLM Reinforcement Learning

TL;DR for operators A naming note before the machinery starts: the existing Cognaptus title says JoyAgents-R1, but the arXiv paper itself names the benchmark HiMA-Ecom and the training method HiMA-R1. This revision uses the paper’s terminology, because accuracy is not decorative trim. The paper is useful for operators because it does not simply say “use more agents.” That slogan is old, cheap, and usually followed by a demo in which three chatbots politely agree with one another until the invoice arrives. The real contribution is more specific: the authors build a hierarchical e-commerce assistant benchmark, then train the master agent and specialised sub-agents jointly with reinforcement learning instead of optimising them as isolated prompt puppets.1 ...

Innovation, Agentified: How TRIZ Got Its AI Makeover

TL;DR for operators A crane is a useful place to test agentic innovation because the problem is painfully concrete: move heavy loads faster, avoid dangerous swinging, prevent overheating, and do not accidentally turn productivity into an incident report. The paper behind TRIZ Agents uses exactly this kind of gantry-crane improvement problem to test whether a multi-agent LLM system can follow the TRIZ method and produce plausible engineering ideas.1 ...

Divide and Model: How Multi-Agent LLMs Are Rethinking Real-World Problem Solving

TL;DR for operators Real business problems do not arrive as tidy exam questions. They arrive as “Can we optimise this logistics network?”, “Which markets should we prioritise?”, “How many clinics do we need?”, or “What happens if the subsidy disappears?” The annoying part is not the equation. The annoying part is deciding what the equation should even represent. ...

From Cog to Colony: Why the AI Taxonomy Matters

TL;DR for operators Most organisations do not need “Agentic AI” because it sounds more advanced. They need the smallest reliable architecture that can complete the job without creating a private zoo of semi-autonomous software creatures. The paper behind this article, AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges, argues that AI Agents and Agentic AI are not interchangeable labels.1 An AI Agent is usually a bounded system: it interprets a task, calls tools, uses context, and produces an action or output. Agentic AI is a broader system pattern: multiple specialised agents coordinate, share memory, decompose goals, recover from failures, and work toward higher-level objectives. ...

Body of Proof: Why Embodied AI Needs More Than One Mind

TL;DR for operators A robot that works alone is already expensive, brittle, and rude to your maintenance budget. A group of robots that must work together adds a different class of difficulty: timing, communication, role allocation, shared perception, physical interference, changing team composition, and the occasional human wandering into the scene with a clipboard. ...

Logos, Metron, and Kratos: Forging the Future of Conversational Agents

TL;DR for operators Conversational agents are moving from polite text boxes into operational systems: booking, triaging, recommending, retrieving, judging, escalating, and occasionally making a confident mess with impressive formatting. The useful lesson from these two papers is simple: enterprise agents cannot be trusted just because they can reason, remember, or call tools. Those are necessary capabilities, not sufficient safeguards. A serious agent needs a fourth layer: a way to evaluate whether its own decisions and judgments deserve to be used. ...

Two Heads Are Better Than One: How Dual-Engine AI Reshapes Analytical Thinking

TL;DR for operators DEoT is not “a smarter chatbot”. It is a structured analysis workflow for questions where there is no single correct answer: policy impact, market entry, geopolitical risk, crisis response, investment implications, technology disruption, and the usual executive swamp where every answer arrives with a footnote and a headache. The paper’s useful idea is simple: open-ended analysis needs two motions. First, go wide enough not to miss important dimensions. Then go deep enough not to produce a shallow consultant-flavoured smoothie. DEoT formalises this through a Breadth Engine, a Depth Engine, and an Engine Controller that decides when to branch, when to drill, and when to stop. ...

Memory in the Machine: How SHIMI Makes Decentralized AI Smarter

TL;DR for operators Memory is becoming an operations problem, not just a model feature. Once multiple AI agents maintain local context, update independently, and need to coordinate without a central brain, the usual “throw it into a vector database and pray politely” approach starts to creak. SHIMI, short for Semantic Hierarchical Memory Index, proposes a different memory layer for decentralized agent systems.1 Instead of storing knowledge as a flat set of embedding vectors, it organizes memory as a hierarchy of semantic concepts. Retrieval works by descending from broad concepts to specific entities. Synchronization works by exchanging only the parts of local memory trees that have diverged, using Merkle-DAG summaries, Bloom filters, and CRDT-style merging. ...