Actor-Critic

A fleet looks unified on a dashboard. It is rarely unified in the world. The warehouse robots share a navigation objective, but one floor has glossy tiles, another has uneven concrete, and a third has humans who treat marked lanes as casual decoration. The delivery drones may use the same controller family, but wind, payload, battery ageing, and local regulation quietly rewrite the operating problem. Industrial arms may repeat the same task, until a supplier swaps a component and the “same” movement is no longer quite the same. ...

Actor-Critic

Share the Trunk, Spare the Averaging: Federated Actor-Critic Gets Personal

Policy Gradients Grow Up: Teaching RL to Think in Domains