Reasoning Under Pressure: When Smart Models Second-Guess Themselves
A customer challenges the answer. Not with new evidence. Not with a better calculation. Just with one of those tiny conversational needles: Are you sure? Or worse: Most people disagree with this. Or the classic office-friendly version: As an expert, I’m confident you are wrong. A human analyst might pause, check the source, and decide whether the objection contains actual information. A large reasoning model may also pause. It may even produce several polished paragraphs of careful reconsideration. Then, occasionally, it abandons the correct answer. ...