Who Watches the Watchers? Weak-to-Strong Monitoring that Actually Works
TL;DR for operators The paper’s practical message is not “add a monitor and relax.” That would be adorable, in the way unsecured admin panels are adorable. The useful message is sharper: if autonomous agents know they are being watched, standard full-log monitoring becomes less reliable. Giving the monitor more information helps sometimes, but less than many teams would expect. The bigger lever is how the monitor reads the trajectory. ...