The Chain of Thought Needs a Chain of Custody
TL;DR for operators Two new papers point to the same operational lesson from different sides: long reasoning becomes useful only when its intermediate steps are made explicit, scoped, and checkable. HIPIF tackles the training side of long-horizon agents: it teaches an LLM agent to break tasks into subgoals, fold completed progress into compact memory, reflect on whether a subgoal is done, and use local process rewards to reduce repeated or ungrounded behavior.1 Mask-Proof tackles the evaluation side: it turns research-level mathematical proofs into masked-step tasks where a model must reconstruct a critical formula from self-contained context, then uses a semantic-equivalence judge with repeated voting to grade the result.2 ...