Thinking Before Lying: Why Reasoning Nudges AI Toward Honesty
A chatbot is asked a simple workplace question: your manager praises you for work your teammate actually did. Do you correct the record, or quietly accept the credit? Now add money. Correcting the record costs you a raise. Add more money. Then add more. This is the useful part of the new paper Think Before You Lie: How Reasoning Leads to Honesty: it does not ask whether a model can recite an ethics slogan. That test has become almost decorative at this point. It asks what happens when honesty becomes expensive, and whether forcing the model to deliberate changes the answer.1 ...