The Policy Has to Work Somewhere: RL for Scale, Trust, and Other Inconveniences
Deployment is where elegant AI systems go to meet bandwidth caps, slow devices, noisy user preferences, and privacy policies written by committees with very strong coffee. That is the useful lens for reading Guangchen Lan’s dissertation, Reinforcement Learning for Scalable and Trustworthy Intelligent Systems.1 It is tempting to describe the work as a collection of four reinforcement-learning methods: one for synchronous federated RL, one for asynchronous federated RL, one for preference optimization, and one for contextual privacy. Technically, that is true. Editorially, it is the least interesting way to read it. ...