Cover image

Training Models to Explain Themselves: Counterfactuals as a First-Class Objective

Rejected. That is where counterfactual explanations usually enter the story. A loan applicant is declined by an automated system. A hiring candidate is filtered out. An insurance customer is priced into an unfavorable category. The counterfactual explanation is supposed to answer a practical question: what would need to change for the model to give me the desired outcome? ...

January 24, 2026 · 16 min · Zelina