Wait, Let Me Check: Why Long-CoT AI Can Still Verify the Wrong Thing
Checking is supposed to calm people down. In business, a second review makes a financial model feel safer. A compliance checklist makes a release feel governed. A senior analyst saying “let me double-check that” gives the room a small dopamine hit of procedural seriousness. Long Chain-of-Thought models have learned the same theatre. They pause. They reconsider. They say “wait.” They verify arithmetic. They sometimes generate reasoning traces so long that one begins to feel the model must be thinking deeply, if only because wasting that many tokens while being shallow seems rude. ...