Policy Gradients Grow Up: Teaching RL to Think in Domains
Opening — Why this matters now Reinforcement learning keeps winning benchmarks, but keeps losing the same argument: it doesn’t generalize. Train it here, deploy it there, and watch confidence evaporate. Meanwhile, classical planning—decidedly uncool but stubbornly correct—has been quietly producing policies that provably work across arbitrarily large problem instances. This paper asks the uncomfortable question the RL community often dodges: can modern policy-gradient methods actually learn general policies, not just big ones? ...