Thinking-Tokens

The quick take Most debates about “diminishing returns” fixate on single‑step metrics. This paper flips the lens: if your product’s value depends on how long a model can execute without slipping, then even small per‑step gains can produce super‑linear increases in the task length a model can finish. The authors isolate execution (not planning, not knowledge) and uncover a failure mode—self‑conditioning—where models become more likely to err after seeing their own past errors. Reinforcement‑learned “thinking” models largely bypass this and stretch single‑turn execution dramatically. ...