Cover image

Competency Gaps: When Benchmarks Lie by Omission

Opening — Why this matters now Large Language Models are scoring higher than ever, yet complaints from real users keep piling up: over-politeness, brittle refusals, confused time reasoning, shaky boundaries. This disconnect is not accidental—it is statistical. The paper Uncovering Competency Gaps in Large Language Models and Their Benchmarks argues that our dominant evaluation regime is structurally incapable of seeing certain failures. Aggregate benchmark scores smooth away exactly the competencies that matter in production systems: refusal behavior, meta-cognition, boundary-setting, and nuanced reasoning. The result is a comforting number—and a misleading one. ...

December 27, 2025 · 4 min · Zelina
Cover image

Graphing the Invisible: How Community Detection Makes AI Explanations Human-Scale

Opening — Why this matters now Explainable AI (XAI) is growing up. After years of producing colorful heatmaps and confusing bar charts, the field is finally realizing that knowing which features matter isn’t the same as knowing how they work together. The recent paper Community Detection on Model Explanation Graphs for Explainable AI argues that the next frontier of interpretability lies not in ranking variables but in mapping their alliances. Because when models misbehave, the problem isn’t a single feature — it’s a clique. ...

November 5, 2025 · 4 min · Zelina