Prompt and Circumstance: Why One Accuracy Number Is Not a Reliability Audit
A practical reading of a new multi-variant audit showing why AI model reliability depends on prompts, evaluators, calibration definitions, and parseability—not just benchmark accuracy.