Exploration

When Curiosity Becomes Contagious: Mutual Intrinsic Rewards in Multi-Agent RL

Doors are excellent teachers. A locked door in a maze looks trivial to a human observer. One agent opens it. Another agent walks through it. Everyone goes home, preferably before the training budget quietly evaporates. But for reinforcement-learning agents, especially in sparse-reward environments, that door is not a door. It is a credit-assignment trap wearing blue paint. ...

When Smart AI Gets It Wrong: Diagnosing the Knowing-Doing Gap in Language Model Agents

TL;DR for operators A smart agent can still be a bad decision-maker. That is the useful, slightly annoying lesson from LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities.1 The paper studies Gemma2 models acting in simple decision environments and finds that they often fail not because they cannot describe the right strategy, but because they do not reliably execute it. ...