No Prompt Left Behind: How Shopee’s CompassMax Reinvents RL for Giant MoE Models
Shopee’s CompassMax-V3-Thinking paper shows that scaling RL for giant MoE models is less about buying more rollouts and more about making every rollout produce usable learning signal.