Intrinsic Reward

Doors are excellent teachers. A locked door in a maze looks trivial to a human observer. One agent opens it. Another agent walks through it. Everyone goes home, preferably before the training budget quietly evaporates. But for reinforcement-learning agents, especially in sparse-reward environments, that door is not a door. It is a credit-assignment trap wearing blue paint. ...