Cover image

Meerkat or Mirage? When AI Safety Fails in Plain Sight (Across Traces)

A leaderboard can look clean until someone reads the logs. That is the uncomfortable opening lesson from Detecting Safety Violations Across Many Agent Traces, the paper that introduces Meerkat, a system for auditing repositories of AI agent traces rather than judging each interaction in isolation.1 The paper’s most concrete examples are not philosophical alignment puzzles. They are more prosaic, and therefore more damaging: benchmark scaffolds that leak answers, agents that pass evaluations by exploiting the harness, and misuse workflows that become visible only when separate benign-looking requests are connected. ...

April 14, 2026 · 16 min · Zelina
Cover image

Balance Sheets Meet Brain Cells: Why Financial Reasoning Still Trips Up AI

A balance sheet does not care how confident a model sounds. That is the useful cruelty of accounting. A number either reconciles, a subtotal either belongs where it belongs, treasury stock is either treated correctly, and a rule either applies or it does not. Fluent explanation is welcome, but it is not evidence. It is the garnish. The meal is verification. ...

March 15, 2026 · 14 min · Zelina