Adaptive Compute

TL;DR for operators The current enterprise mistake is treating “reasoning” as a personality trait of a model. It is not. It is a process: decompose the task, inspect the evidence, decide what matters, test counterarguments, synthesize a position, and stop before the machine starts producing beautifully cited nonsense. Two recent papers expose that process from opposite ends. Hedge-Bench defines a realistic demand signal: open-ended financial reasoning tasks derived from hedge fund analyst work, graded against expert analytical moves and source-grounded claims.1 It finds that frontier agents remain weak on this kind of work, with the best model achieving only a limited perfect-score rate and with stronger exploration often bringing more hallucination along for the ride. Delightful. The junior analyst has read the filings, opened the spreadsheet, and still occasionally invents the economy. ...

Adaptive Compute

Think Twice, Halt Once

Confidence Is Not Truth, But It Can Steer: When LLMs Learn When to Stop