Don’t Self-Sabotage Me Now: Rational Policy Gradients for Sane Multi-Agent Learning
Opening — Why this matters now Multi-agent systems are quietly becoming the backbone of modern automation: warehouse fleets, financial trading bots, supply-chain optimizers, and—if you believe the more excitable research labs—proto-agentic AI organizations. Yet there’s a peculiar, recurring problem: when you ask agents to improve by playing against each other, they sometimes discover that the fastest route to “winning” is to make sure nobody wins. ...