Coaching the Swarm: Why Multi‑Agent RL Finally Scales
Opening — Why this matters now Multi‑agent systems are having a moment. Everywhere you look—AutoGen‑style workflows, agentic data pipelines, research copilots—LLMs are being wired together and told to collaborate. Yet most of these systems share an uncomfortable secret: they don’t actually learn together. They coordinate at inference time, but their weights remain frozen, their mistakes repeatedly rediscovered. ...