Cover image

Temperament Over Talent: Why AI Behavior Is the New Competitive Edge

Procurement loves a leaderboard. That is understandable. A leaderboard is clean, sortable, and emotionally comforting. One model scores higher on reasoning. Another is cheaper per token. A third has a larger context window and a launch page written in the usual dialect of technological destiny. Decision made, presumably. Then the model enters a real workflow. ...

April 4, 2026 · 15 min · Zelina
Cover image

The Sealed Score: Why AI Evaluation Needs an Exam Day

A leaderboard score is useful until everyone starts treating it as a target. That is the uncomfortable business problem behind LLM Olympiad: Why Model Evaluation Needs a Sealed Exam.1 The paper is not arguing that benchmarks are useless. That would be theatrical, and not especially true. It argues something sharper: in the LLM era, a benchmark score is only as credible as the procedure that produced it. ...

March 25, 2026 · 15 min · Zelina