Skill Issue, Literally: Repairing Agent Instructions Without an Answer Key
TL;DR for operators Runbooks decay. APIs shift, data schemas mutate, file paths move, and the “expert procedure” that worked last quarter starts quietly steering an agent into a wall. The paper behind this article, SkillAudit: Ground-Truth-Free Skill Evolution via Paired Trajectory Auditing, asks a useful operational question: can an agent skill be improved when nobody has provided hidden tests, reference answers, task rewards, or expert labels?1 ...