Explaining the Explainers: Why Faithful XAI for LLMs Finally Needs a Benchmark
Hiring. A candidate writes a personal statement. A screening model gives a score. A manager asks the AI system why. The explanation says work experience mattered most, education came next, and demographic variables barely moved the decision. Everyone relaxes, because the explanation sounds reasonable. That is the dangerous part. A reasonable explanation is not necessarily a faithful explanation. A counterfactual edit that looks plausible is not necessarily a causal counterfactual. And a model that appears insensitive to demographic concepts may not be “fair”; it may simply have learned, or been aligned, to suppress visible sensitivity in the narrow setting being tested. ...