Diffusing to Coordinate: When Multi-Agent RL Learns to Breathe
Opening — Why This Matters Now Multi-agent systems are quietly becoming infrastructure. Autonomous fleets. Robotic warehouses. Algorithmic trading desks. Distributed energy grids. Each of these is no longer a single model making a clever decision. It is a collection of policies that must coordinate under uncertainty, partial information, and non-stationarity. Yet most online multi-agent reinforcement learning (MARL) still relies on unimodal Gaussian policies. In other words, we ask a complex team to act like a committee that only ever votes for the mean. ...