Cover image

Provenance, Not Providence: Why AI Answers Need Receipts

Opening — Why this matters now The current AI market has become very good at producing fluent answers and very bad at explaining where those answers came from. This is not a minor inconvenience. It is the difference between an assistant that can be trusted in an operational workflow and an assistant that merely performs confidence with attractive typography. ...

May 9, 2026 · 14 min · Zelina
Cover image

When the Model Knows but Doesn't Remember: The Hidden Blind Spot in LLM Contamination Detection

Audit. That is the word companies like to use when they want uncertainty to sound disciplined. Model audit. Benchmark audit. Contamination audit. The phrase suggests a clean checklist: run the detector, read the score, decide whether the benchmark is safe. The paper behind today’s article makes that picture less comfortable. It studies Contamination Detection via output Distribution, or CDD, on small language models and finds a simple but awkward failure mode: a model can be trained on contaminated benchmark examples, learn from them, and still avoid the kind of verbatim memorization that CDD is designed to catch.1 ...

March 4, 2026 · 14 min · Zelina