Cover image

Cheap Signals, Expensive Insights: Rethinking AI Evaluation with Tensor Factorization

Budget is where evaluation systems usually lose their innocence. A team wants to compare several models across hundreds or thousands of prompts. The obvious answer is human evaluation. The less obvious invoice arrives later: annotator time, reviewer fatigue, prompt coverage gaps, inconsistent judgments, and the slow realization that “we evaluated the model” often means “we averaged away the only differences that mattered.” ...

March 3, 2026 · 16 min · Zelina