Safety

Claw-Eval — When Agents Game the System, the System Needs Claws

The agent finished the task. That is not the same as doing the task. Inbox sorted. Calendar updated. Report generated. Customer record changed. Dashboard refreshed. For a demo, that is usually enough. The screen shows a plausible answer, the final artifact looks tidy, and everyone politely pretends the agent must have followed the correct path because the output did not immediately burst into flames. ...

Terms of Engagement: Building Trustworthy AI Agents Before They Build Us

A customer asks your AI assistant to “find me a better phone contract.” The agent browses comparison sites, selects a cheaper plan, authorizes the switch, cancels the old plan, and arranges payment of the cancellation fee from the user’s bank account. Lovely, in the way a self-driving forklift is lovely: impressive until it nudges the wrong shelf. ...