Benchmarking the Benchmarks: When AI Safety Metrics Stop Meaning Anything
Safety used to sound like a simple procurement question. A vendor says its model is safe. The slide deck has benchmark scores. The scores have respectable names: accuracy, F1, safety score, refusal rate, attack success rate. Everyone nods, because familiar metric names create the soothing illusion that someone has already done the hard work. ...