A chatbot refuses a dangerous request. Everyone relaxes.
This is the small theatre of modern AI safety: the model says no, the dashboard records a refusal, the vendor presentation adds another green checkmark, and the compliance team moves on to the next risk register. Very tidy. Very comforting. Also, increasingly insufficient.
The problem is not that refusal behavior is meaningless. It is not. The problem is that refusal behavior is only one visible symptom of safety alignment. Modern LLM safety now depends on a larger chain: training objectives, post-training choices, inference interfaces, prompt formats, tool access, evaluation design, and deployment context. When any part of that chain changes, the nice refusal seen in a benchmark may not survive contact with the product.
...