When Bandits Get Priority: Learning Under Scarce, Tiered Capacity
Capacity looks simple until someone pays to jump the queue. That is the quiet problem behind a large amount of modern AI infrastructure. A platform may have many model instances, edge servers, or compute nodes. Tasks arrive with different business value. Enterprise traffic is more important than free-tier traffic. Some jobs have tighter latency targets. Some users, by contract or politics, are simply not equal. Lovely democratic fiction ends at the load balancer. ...