Claw-Eval — When Agents Game the System, the System Needs Claws
Opening — Why this matters now AI agents have quietly crossed a threshold. They no longer just answer questions—they act. They send emails, call APIs, modify files, orchestrate workflows. In other words, they’ve moved from generating text to generating consequences. And yet, most evaluation methods still behave as if we’re grading essays. That mismatch is no longer academic. It’s operational risk. ...