Counterfactuals Unchained: How Causality Escapes Its Own Models

A loan is rejected. Now explain why.

A borrower is rejected by an automated lending system. The compliance team asks a simple question: What caused the rejection?

A naïve answer points to a variable: low income, high debt ratio, thin credit history, missing documentation, or some equally respectable-looking field in the model. A better answer asks what would have happened if that variable had changed. A still better answer asks which surrounding facts must be held fixed while we imagine that change.

That last phrase is where the trouble starts.

If we change gender but hold salary fixed, are we testing discrimination through perceived reliability? If we change salary but hold job history fixed, are we asking a realistic counterfactual or just performing spreadsheet theatre? If we change a user’s belief, an AI agent’s memory, or a security protocol’s nested expectation about another system’s behavior, are we still inside the comfortable world of structural causal models?

Halpern and Pass’s paper, Causality Without Causal Models, answers with a small but rather dangerous idea: actual causality should not be chained to one modeling format. It can be abstracted to any framework where counterfactuals are defined.¹

That sounds philosophical. It is not only philosophical. It changes what kinds of explanations a system can legally, operationally, and logically express.

The paper’s core move is simple enough to state before the notation attacks:

$A$ is a cause of $B$ if, conditional on some appropriate true condition $C$, $A$ becomes a but-for cause of $B$.

The important word is not “cause.” It is “appropriate.” As usual, the most expensive part of the system is hiding in the adjective.

The old trick: make causality work by holding the right facts fixed

The paper begins from the Halpern-Pearl account of actual causality, one of the most influential formal treatments of causal explanation. In its standard setting, causality is defined using structural causal models: variables, equations, contexts, and interventions.

This framework is powerful because it makes causal reasoning explicit. It is also narrow because the language of causes is restricted. In the standard HP definition, the candidate cause has to look like a conjunction of primitive variable assignments, such as $X=x$ and $Y=y$. The effect can be a Boolean combination of primitive events. That is already useful, but it excludes many statements we naturally want explanation systems to handle: disjunctive causes, beliefs, nested counterfactuals, and more complex modal claims.

The paper does not reject structural causal models. It extracts the logic that makes the HP definition work.

The familiar rock-throwing example does the teaching. Suzy and Billy both throw rocks at a bottle. Suzy’s rock arrives first and shatters it. Billy’s rock would have shattered it if Suzy had not thrown. We still want to say Suzy’s throw caused the shattering. But Suzy’s throw is not a plain but-for cause: without Suzy, Billy still breaks the bottle.

The HP move is to hold part of the actual situation fixed. In the model, Billy’s rock did not hit the intact bottle because Suzy’s rock got there first. Hold that fact fixed, then ask whether Suzy’s not throwing would have prevented the shattering. Under that condition, yes. Suzy’s throw is a cause.

This is the mechanism the new paper abstracts:

$$ A \text{ causes } B \quad \text{if there exists a true condition } C \text{ such that } (\neg A \wedge C) \rightarrow \neg B. $$

The authors formalize this as an abstract definition over what they call causal-counterfactual families, or ccfs. A ccf is not a specific causal graph. It is a family of models with states, formulas, truth conditions, and counterfactual semantics. Once a framework can say whether a counterfactual is true, the paper can import a version of actual causality into it.

That is the unchaining. Causality no longer has to live only inside structural equations.

The new definition is not “anything goes”; it is “choose your witness language carefully”

The definition has three conditions. First, the proposed cause and effect must both actually hold. Second, there must be a true condition $\tau$ from an allowed witness language $C_X$ such that if the cause were false while $\tau$ remained true, the effect would be false. Third, the cause must be minimal.

In plainer language:

Formal ingredient	Operational meaning	Why it matters
$A$ and $B$ are true	The alleged cause and outcome actually occurred	Prevents explanations for events that did not happen
A true witness condition $\tau$ exists	Some part of the actual situation is held fixed	Handles preemption, path-specific reasoning, and context-sensitive explanation
$A$ is minimal	The explanation is not bloated	Prevents “the entire universe caused the rejection,” a technically safe but professionally useless answer

The key parameter is $C_X$, the set of formulas allowed as witnesses. This is where the paper quietly becomes relevant to business systems.

A witness language determines what kinds of conditions an explanation system is allowed to hold fixed. If the language is too weak, the system cannot express important distinctions. If it is too strong, causality becomes trivial. The paper gives a sharp warning: if arbitrary disjunctions are allowed, especially disjunctions containing the negation of the outcome, then any formula can become a cause of the outcome. Congratulations, your explanation system now proves everything. Auditors love that, obviously.

So the paper’s practical lesson is not “use richer counterfactuals.” It is more disciplined:

More expressive explanation languages are useful only when the allowed witness conditions are governed.

That distinction matters. Many AI governance discussions treat explanation as a matter of making models more transparent. This paper points to a deeper layer: explanation also depends on the counterfactual grammar. What can the system ask? What can it hold fixed? Which hypothetical worlds count as close enough? Which conditions are admissible witnesses?

A transparent system with the wrong counterfactual language can still explain the wrong thing very clearly.

Why the theorem matters: HP causality survives the abstraction

A formal generalization is not useful if it breaks the thing it claims to generalize. The paper therefore proves that the abstract definition agrees with the HP definition under appropriate restrictions.

The first major result, Theorem 4.2, says that when the allowed witness formulas are conjunctions of non-negated primitive events, the abstract definition gives the same answer as the HP definition in causal models. This is the “backward compatibility” result. It tells us that the abstraction is not merely a new philosophical toy placed next to HP causality; under the right language restriction, it collapses back to the standard account.

The second major result, Theorem 4.5, extends the bridge to recursive counterfactual structures that strongly correspond to recursive causal models. But now the witness language must allow a restricted form of disjunction involving the candidate cause: roughly, formulas of the form “the cause takes its actual value or this alternative value,” alongside conjunctions of primitive events.

That detail is not decorative. It shows that the witness language required for equivalence depends on the semantic framework. Structural causal models and counterfactual structures are not interchangeable containers. They preserve the same causal judgments only under carefully matched restrictions.

A useful way to read the formal results is this:

Result	Likely purpose	What it supports	What it does not prove
Definition 4.1	Main theoretical mechanism	Actual causality can be defined over arbitrary causal-counterfactual families	That all counterfactual semantics are equally useful
Theorem 4.2	Equivalence with prior work	The abstract definition agrees with HP causality in causal models under restricted witnesses	That richer witness languages preserve HP judgments
Theorem 4.5	Generalization to recursive counterfactual structures	HP-style causality can be recovered in corresponding counterfactual structures with a slightly richer witness language	That arbitrary disjunctions are safe
Backtracking section	Expressive extension	Backtracking can be enabled or disabled by witness choice	That backtracking is always desirable
Explanation section	Extension from cause to explanation	The same abstraction can carry over to causal explanation relative to an agent’s knowledge	That the paper provides an implemented explanation engine

The appendix proofs are not robustness checks in the empirical-machine-learning sense. They are the main evidence. This is a formal paper. There are no experiments, no benchmark tables, no ablation curves, and no leaderboard victory lap. Strange, I know. Some papers still try to prove things instead of asking a GPU cluster to applaud.

The loan example is the business version of the rock bottle

The paper’s loan example is brief, but it is the bridge to business practice.

Suppose gender affects a loan decision through two paths. One path runs through salary. Another runs through perceived reliability. A lender might decide that salary is a legitimate input, while gender-based perceived reliability is not. To isolate the acceptable path, we hold the reliability perception fixed and examine the effect along the salary path.

This is path-specific causality, and it is exactly the kind of question organizations face when they try to explain automated decisions.

A normal feature-importance explanation may say gender had little or no direct effect. A counterfactual explanation may say the decision would have changed under another profile. But a governance-grade explanation often needs to ask a narrower question:

Did the outcome depend on a protected attribute through an inadmissible pathway?

The paper’s mechanism gives a formal language for this style of reasoning: causality conditional on selected facts remaining fixed.

That does not mean the paper gives a ready-made compliance product. It does not. It gives a formal abstraction that helps specify what such a product would need to make explicit. A deployed system would still need domain rules, data definitions, model access, legal interpretation, and a defensible choice of counterfactual semantics. The paper supplies the logic; business still has to supply the policy judgment. A tragic division of labor, but apparently we are not yet free from governance meetings.

Richer causes are not philosophical decoration

The likely misconception around this paper is that richer counterfactual languages are mostly a philosophical luxury. The paper argues the opposite. Richer languages change what can be expressed as a cause.

The standard HP setup struggles with causes or effects involving disjunctions, negations, beliefs, and nested counterfactuals. The paper points to cases where these are not exotic.

In security, nested counterfactuals can express properties such as what would happen if an attacker believed one thing while another protocol condition changed. In agent systems, belief interventions matter: if Alice believed a vaccine was effective, would she take it? In business automation, similar questions appear when an AI agent’s internal state affects downstream action: if the agent had believed the customer was high-risk, would it have escalated the case? If the agent had believed the previous tool output was unreliable, would it have called another tool?

A structural causal model can often be engineered to represent some of this. But the engineering may be awkward, controversial, or dependent on special-purpose extensions. Halpern and Pass’s approach says: start with a framework whose counterfactual semantics you already accept, then define causality over that framework.

The tradeoff is obvious. You gain expressiveness, but you inherit the burden of your counterfactual semantics. If the underlying framework has strange closest-world behavior, your causal judgments may become strange too. The paper is not offering a magic solvent. It is offering a cleaner separation between two design questions:

What counterfactuals does the framework support?
Given those counterfactuals, what counts as an actual cause?

That separation is useful because many business explanation systems blur the two. They present a causal answer while hiding the assumptions about possible worlds, interventions, and fixed conditions. The result is not necessarily wrong. It is just under-specified, which is wrong’s better-dressed cousin.

Backtracking is not a bug when the question is historical diagnosis

Structural causal models usually enforce “no backtracking.” If we intervene to set $X=x$, upstream variables do not change. Only descendants of $X$ can be affected. This is sensible for many intervention questions. If we force a credit-score field to a new value, we do not thereby rewrite the borrower’s past income.

But not every causal question is an intervention question.

Sometimes we ask a historical diagnostic question: if this event had not occurred, what else about the earlier world would likely have been different? Humans often reason this way. The paper explains that Lewis-style counterfactual structures naturally allow such backtracking because the closest world where $X=x$ may also differ in upstream facts.

The rock example returns. In a counterfactual structure where the closest world with Billy not throwing is also one where neither Suzy nor Billy throws, Billy’s throw can become a cause of the bottle shattering. This differs from the standard HP judgment in the original causal model, where Billy’s throw is not a cause because Suzy’s rock preempted it.

The important point is not that one answer is always right. The point is that the answer depends on the counterfactual question being asked.

For business systems, this distinction appears in incident reviews.

If a trading bot places a bad order, one question asks: what intervention would have prevented the order while leaving upstream facts fixed? Another asks: what alternative earlier world would explain why this order never happened? The first is useful for control design. The second may be useful for root-cause investigation. Mixing them produces the familiar corporate postmortem genre: confident, circular, and somehow signed by twelve people.

The paper’s framework lets backtracking be controlled by the witness language. If we want no backtracking, we can hold the context fixed. If we want some backtracking, we can hold only selected ancestors fixed. The mechanism is flexible, but again, flexibility is not free. Someone must decide which counterfactual discipline the explanation system is using.

Explanation depends on what the agent already knows

The paper also extends the abstraction from causality to explanation.

This move matters because cause and explanation are not identical. A fact may be a cause without being a useful explanation for a particular audience. If the audience already knows it, it may fail as an explanation. If it is too broad, it may be true but unhelpful. If it guarantees the outcome across the audience’s possible worlds, it may explain even when different conjuncts do causal work in different contexts.

The HP account models explanation relative to an agent’s epistemic state: the set of contexts the agent considers possible. Halpern and Pass translate this into the abstract setting by using sets of states in a causal-counterfactual family.

This is directly relevant to AI systems that generate explanations for different users. The same event can require different explanations for:

an end customer;
a compliance officer;
a model-risk reviewer;
an engineer debugging an agent workflow;
a regulator evaluating whether a prohibited path influenced a decision.

The paper’s formalism does not design the user interface. It clarifies why a single universal explanation is often the wrong target. Explanation is relative to what the receiver already knows and what uncertainty remains open.

A customer may need: “Your application was rejected because verified income was below the threshold.” A compliance officer may need: “The rejection remains under counterfactual changes to protected-attribute proxies, except through this admissible income path.” An engineer may need: “The retrieval tool’s stale record was a cause of the agent’s final recommendation conditional on the risk-threshold configuration being held fixed.”

These are not just different wordings. They are different explanatory tasks.

What this paper directly shows, and what Cognaptus infers

The paper directly shows a formal result: HP-style actual causality can be abstracted to arbitrary causal-counterfactual families, and under appropriate witness-language restrictions, the abstract definition agrees with HP causality in standard causal-model settings. It also shows how the abstraction extends to Lewis-style counterfactual structures, backtracking, and causal explanation.

Cognaptus infers a practical design lesson: explanation systems should expose their counterfactual contract.

A counterfactual contract should answer at least four questions:

Design question	Why it matters in business systems
What kinds of formulas can be causes?	Determines whether beliefs, disjunctions, nested conditions, or agent states can be part of explanations
What witness conditions may be held fixed?	Determines whether the system supports path-specific, policy-sensitive, or context-sensitive causality
Are upstream variables allowed to change?	Separates intervention planning from historical diagnosis
Whose knowledge defines explanation?	Separates customer-facing explanation from audit, engineering, or regulatory explanation

This is not a call to make every enterprise workflow use a full formal logic engine. That would be one way to ensure nobody uses it. The more realistic implication is architectural: when building AI governance tools, agent audit trails, decision-review systems, or compliance explanations, teams should treat counterfactual semantics as a first-class design object.

The question is not only “can the system explain itself?” The question is “what kind of counterfactual world is the system allowed to use when it explains itself?”

That is less catchy. It is also less likely to become a procurement slogan, which is one of its strengths.

Where the result stops

This paper is not an empirical validation of explanation quality. It does not test users, compare explanation systems, benchmark fairness tools, or measure decision accuracy. It is a formal theory paper.

That boundary matters. The results support conceptual and mathematical design, not immediate operational performance claims.

There are also two technical boundaries with business consequences.

First, the choice of witness language is powerful and dangerous. Too restrictive, and the system misses relevant causal distinctions. Too permissive, and causality can collapse into triviality. The paper explicitly notes that arbitrary disjunctions can make almost anything a cause. In operational terms, an explanation engine needs governance over its explanation grammar, not only over its data and models.

Second, counterfactual semantics must be trusted before causal judgments built on them can be trusted. If a system’s “closest world” relation is poorly specified, arbitrary, or misaligned with the domain, then the causal explanation may be formally valid inside a useless hypothetical universe. This is not a defect in the paper. It is the price of abstraction.

A practical deployment would need to pair the formal machinery with domain constraints:

legal definitions of admissible and inadmissible pathways;
engineering definitions of agent state and tool-use events;
audit policies for which variables may be held fixed;
user-specific explanation policies;
tests for whether the counterfactual assumptions produce stable and defensible judgments.

The paper gives a language for these choices. It does not remove the need to make them.

The real upgrade is not more causality; it is more explicit causality

The easiest way to misread this paper is to say: “Great, we can now do causality without causal models.”

Not quite.

A better reading is: we can define causality wherever counterfactuals are already meaningful, and we can understand structural causal models as one important special case rather than the whole universe.

That shift is valuable because modern AI systems increasingly operate in domains where causes are not always simple variable assignments. Agent behavior involves beliefs, memory, tool calls, nested assumptions, and path-dependent workflows. Governance questions involve admissible and inadmissible pathways. Security questions involve counterfactuals about what another party would know or do under alternative conditions. Business explanations depend on the audience’s knowledge.

For all of these, the old model-shaped box may be too tight.

Halpern and Pass do not replace causal modeling with vibes. They abstract the HP mechanism, preserve compatibility where it matters, and expose the role of witness language as the real control surface. That is the contribution: not causality without discipline, but causality without being trapped inside one modeling architecture.

For business readers, the message is direct. When an AI system explains a decision, do not ask only whether the explanation is causal. Ask what counterfactual language made it causal.

That is where the governance lives.

Cognaptus: Automate the Present, Incubate the Future.

Joseph Y. Halpern and Rafael Pass, “Causality Without Causal Models,” arXiv:2511.21260, 2025, https://arxiv.org/pdf/2511.21260. ↩︎

A loan is rejected. Now explain why.#

The old trick: make causality work by holding the right facts fixed#

The new definition is not “anything goes”; it is “choose your witness language carefully”#

Why the theorem matters: HP causality survives the abstraction#

The loan example is the business version of the rock bottle#

Richer causes are not philosophical decoration#

Backtracking is not a bug when the question is historical diagnosis#

Explanation depends on what the agent already knows#

What this paper directly shows, and what Cognaptus infers#

Where the result stops#

The real upgrade is not more causality; it is more explicit causality#