Search-R2: When Retrieval Learns to Admit It Was Wrong
Opening — Why this matters now Search-integrated LLMs were supposed to be the antidote to hallucination. Give the model tools, give it the web, let it reason step by step—problem solved. Except it wasn’t. What we actually built were agents that search confidently, reason eloquently, and fail quietly. One bad query early on, one misleading paragraph retrieved at the wrong moment, and the whole reasoning chain collapses—yet reinforcement learning still rewards it if the final answer happens to be right. ...