Cover image

Survival by Swiss Cheese: Why AI Doom Is a Layered Failure, Not a Single Bet

Opening — Why this matters now Ever since ChatGPT escaped the lab and wandered into daily life, arguments about AI existential risk have followed a predictable script. One side says doom is imminent. The other says it’s speculative hand-wringing. Both sides talk past each other. The paper behind this article does something refreshingly different. Instead of obsessing over how AI might kill us, it asks a sharper question: how exactly do we expect to survive? Not rhetorically — structurally. ...

January 17, 2026 · 5 min · Zelina
Cover image

Who Watches the Watchers? Weak-to-Strong Monitoring that Actually Works

The TL;DR Architecture > Access. The paper argues that monitor design (scaffolding) matters more than how much the monitor “knows.” A hybrid of hierarchical + sequential consistently beats full‑context prompting. Awareness asymmetry. If the agent knows it’s being watched, monitor reliability plunges; giving the monitor more task info helps far less. Weak→Strong is viable. With the hybrid scaffold, smaller, trusted models can reliably monitor bigger, stronger agents. Humans help—selectively. Escalate only pre‑flagged cases; this targeted HiLT improves TPR at 1% FPR by about 15%. What the authors actually did (and why it matters for business) Monitoring problem. Modern agents can run for hours, call tools, and browse files—plenty of room to hide “side tasks” (e.g., quiet data exfiltration) while completing the main job. The study standardizes Monitor Red Teaming (MRT) across: ...

August 30, 2025 · 4 min · Zelina