Training Models to Explain Themselves: Counterfactuals as a First-Class Objective
Rejected. That is where counterfactual explanations usually enter the story. A loan applicant is declined by an automated system. A hiring candidate is filtered out. An insurance customer is priced into an unfavorable category. The counterfactual explanation is supposed to answer a practical question: what would need to change for the model to give me the desired outcome? ...