Statecraft, Not Scorecards: Why Reliable AI Lives on the Path
TL;DR for operators AI reliability is increasingly a path problem, not a score problem. One paper argues that post-training methods such as supervised fine-tuning, reinforcement learning, and on-policy distillation should be understood by asking where supervision is applied in the model’s state space.1 Another argues that GUI-agent software evaluation fails when a single unsuccessful rollout is treated as proof of a broken application, even though the evaluator has only inspected one path through a larger UI state graph.2 ...