When Agents Learn Without Learning: Test-Time Reinforcement Comes of Age
A team meeting usually ends with someone saying, “Let’s remember this for next time.” Human teams sometimes do. Agent teams usually do not. A group of LLM agents can debate, critique, revise, and produce a final answer. Then the whole episode often disappears into the landfill of inference logs: useful comments, bad guesses, decisive objections, elegant checks, all flattened into “the model answered correctly” or “the model failed.” Very modern. Very wasteful. ...