Cheap Signals, Expensive Insights: Rethinking AI Evaluation with Tensor Factorization
Opening — Why This Matters Now AI models are improving faster than our ability to measure them. Leaderboards still compress performance into a single scalar. One number. Clean. Marketable. Comforting. And increasingly misleading. Modern generative models do not “perform” uniformly. They excel at certain prompts, fail quietly on others, and sometimes trade strengths across subdomains. Aggregate metrics flatten this landscape into a polite fiction. ...