A hospital changes its treatment protocol. Another keeps the old one. A third removes an approval step that had quietly influenced several downstream decisions.

Their datasets now disagree.

The usual federated-learning instinct is to treat that disagreement as a problem: smooth it, average it, or design an aggregation rule robust enough to survive it. In causal discovery, however, some disagreements contain precisely the information the global model lacks. Removing a local dependency can expose a previously hidden causal pattern. A policy difference that looks like statistical inconvenience may function as an accidental experiment.

The paper Regret-Based Federated Causal Discovery with Unknown Interventions introduces I-PERI, a two-phase method designed around that possibility.1 It seeks a shared causal structure without pooling client data, without requiring the server to know which variables were intervened upon, and without assuming that every client observes an identical causal system.

The central idea is attractive, but it needs one immediate qualification: heterogeneity is useful only when it has the right causal shape. Random noise, incompatible measurement practices, and unobserved confounding do not become valuable merely because several organizations possess them independently. I-PERI benefits from structural interventions that remove edges and reveal new orientation evidence.

That distinction is where the paper becomes more interesting than another federated benchmark comparison.

The Useful Disagreement Is an Exposed Collider

Observational causal discovery rarely identifies a complete directed causal graph. Instead, it commonly recovers a completed partially directed acyclic graph, or CPDAG.

A CPDAG represents a Markov equivalence class: a collection of causal graphs that imply the same observable conditional independencies. Some edges can be directed confidently. Others remain undirected because observational data cannot distinguish which direction is correct.

One especially informative structure is a v-structure:

$$ A \rightarrow C \leftarrow B $$

when (A) and (B) are not directly connected.

This pattern helps orient edges because the two arrows converge on (C) in a way that produces a distinctive independence structure. But if another edge connects (A) and (B), the collider is shielded. Its orientation may no longer be identifiable from observational data alone.

A structural intervention can change that.

Unlike a parametric intervention, which changes a probability distribution or functional relationship without altering the graph, a structural intervention removes some or all incoming edges to an intervened variable. If that removal eliminates an edge shielding a collider, the client’s local graph may reveal a v-structure that was invisible in the original observational environment.

The paper’s mechanism can therefore be summarized as:

A local intervention may erase part of the causal graph while simultaneously revealing how another part must be oriented.

That creates an awkward aggregation problem. Missing edges should not automatically be interpreted as errors, because interventions may have removed them. Yet newly directed edges should not be ignored, because they may reveal causal information unavailable to the observational clients.

I-PERI handles these two consequences separately.

Phase 1 Reconstructs What Interventions May Have Erased

Earlier regret-based federated causal discovery methods, particularly PERI, assume that clients share the same underlying causal graph. The server proposes a candidate graph, each client calculates how poorly that graph fits its local data, and the server updates the global graph to minimize the worst client regret.

This works cleanly when every client is trying to describe the same object.

It becomes unstable when interventions remove edges locally. A candidate server graph may contain a genuine underlying edge that is absent from an intervened client. Directly comparing the two graphs would penalize the server for being correct.

I-PERI’s first phase modifies what the regret calculation is allowed to see.

Before a client scores the proposed server graph, the method applies a directed-consensus masking operation. Server edges absent from that client’s graph are removed from the comparison. This prevents a structurally intervened client from penalizing the server merely because the intervention erased an edge locally.

The asymmetry is deliberate:

Observed disagreement Phase-one interpretation
An edge exists locally but is missing from the server graph The server may have omitted a genuine edge, so regret should increase
An edge exists in the server graph but is missing locally The edge may have been removed by an intervention, so the client should not penalize it
A local graph provides an orientation compatible with the candidate graph The masked comparison can preserve that local direction

The result is a server-level CPDAG intended to represent the underlying graph from which all client graphs were derived.

This phase depends critically on the paper’s assumption that at least one client holds purely observational data. Without such a client, an edge removed everywhere could disappear without leaving evidence that it belonged to the underlying graph.

Under the theorem’s conditions—known local CPDAGs, a consistent scoring function, and at least one observational client—the first phase converges to the CPDAG of the underlying causal DAG.

The phrase “known local CPDAGs” is doing considerable work. In practice, clients estimate those graphs from finite data. Once estimation enters the pipeline, the usual causal-discovery assumptions return: causal sufficiency, faithfulness, and reliable local structure learning.

The first phase therefore solves a specific federated aggregation problem. It does not solve unreliable causal discovery at the client level.

Phase 2 Spends Disagreement on Arrowheads

After the first phase, the server has reconstructed the ordinary observational CPDAG. It knows which edges belong to the shared underlying structure, but many directions may remain ambiguous.

The second phase asks whether client interventions exposed enough local v-structures to orient some of those edges.

This time, I-PERI uses an undirected-consensus masking operation. Edges absent from an intervened client remain excluded, preventing removed edges from distorting the comparison. But when the server leaves an edge undirected while a client graph directs it, the client’s regret remains higher.

The server can reduce that regret by adopting the supported orientation.

The search is now narrower than in phase one. I-PERI does not rebuild the skeleton. It searches among partially directed graphs obtained by orienting the undirected edges of the recovered server CPDAG.

The sequence matters:

  1. Recover the shared underlying skeleton and observational orientations.
  2. Ignore local edge removals that may result from interventions.
  3. Use intervention-revealed v-structures to orient additional global edges.
  4. Stop when further orientations are not supported by the clients’ regret signals.

This produces the paper’s main theoretical object: the (\Phi)-CPDAG.

The (\Phi)-CPDAG Is More Informative, but Not Omniscient

When intervention targets are known, interventional causal discovery can often identify a tighter equivalence class than observational discovery. The intervention labels explain why a distribution changed and which graph modifications should follow.

I-PERI works under a harder constraint: intervention targets are unknown to the server and may even be unknown to the clients themselves.

The paper defines a new (\Phi)-Markov equivalence class for this setting. Informally, two underlying DAGs belong to the same (\Phi)-equivalence class when they:

  • share the same observational skeleton;
  • share the same observational v-structures; and
  • can produce the same additional intervention-revealed v-structures across their respective unknown intervention families.

The intervention targets need not match between the two candidate DAGs. What matters is whether the federated system can observe the same independence and v-structure evidence without being told which interventions generated it.

The resulting hierarchy is useful:

Available information Recoverable representation Relative informativeness
Observational data only Standard CPDAG Baseline
Federated clients with unknown structural interventions (\Phi)-CPDAG Can be tighter than the standard CPDAG
Interventional data with known targets Interventional CPDAG Potentially tighter than the (\Phi)-CPDAG

This is not a route to full causal omniscience.

In the paper’s simple two-variable example, unknown interventions provide no additional orientation information. Without a newly exposed v-structure, the second phase has nothing useful to spend. The (\Phi)-CPDAG remains identical to the observational CPDAG.

The method’s gain therefore depends on the geometry of the interventions, not merely their existence.

The Theoretical Result Is Clean Because the Local Graphs Are Assumed Clean

The paper provides three main guarantees.

First, the modified phase-one regret converges to the underlying observational CPDAG when local CPDAGs are available, the scoring function is consistent, and at least one client is observational.

Second, the complete two-phase procedure converges to the unique (\Phi)-CPDAG under corresponding assumptions.

Third, the authors bound the sensitivity of the communicated regret values and show that adding appropriately scaled Laplace noise yields (\varepsilon)-differential privacy.

These are meaningful results. They describe what the system can identify and how regret sharing can be privatized.

They should not be confused with a guarantee that an operational deployment will recover the correct graph. The theoretical pipeline begins after each client possesses an accurate local CPDAG. Real clients must estimate those graphs, and errors introduced locally can propagate into the final server graph.

The paper states this directly: an incorrect local orientation may become an incorrect orientation in the global (\Phi)-CPDAG.

The system is federated, but error remains highly social. One client’s confident mistake can become everyone’s arrowhead.

The Main Experiments Demonstrate the Mechanism in a Favorable Synthetic Theatre

The empirical evaluation uses synthetic causal graphs generated from Erdős–Rényi models, with the expected number of edges equal to the number of nodes. The main experiments use linear structural equation models with additive Gaussian noise.

The authors vary:

  • the number of variables: (3, 4, 8, 10,) and (20);
  • the number of clients: (2, 4, 8,) and (10);
  • per-client sample sizes: (500, 1{,}000,) and (2{,}000); and
  • whether clients have homogeneous or heterogeneous sample sizes.

Performance is measured using Structural Hamming Distance, where lower is better, and F1 score against the true DAG, where higher is better.

I-PERI is compared with PERI, FedDAG, NOTEARS-ADMM, and FedCDH. The available baselines were selected partly according to code availability, which is practical but limits how broadly “state of the art” should be interpreted.

The most important experimental-design detail appears after the benchmark setup: the authors artificially introduce shielded colliders and prioritize interventions that create v-structures unavailable from observational data. They also choose random seeds that produce client-level CPDAG F1 scores above (0.85), because reliable local graphs are needed for downstream orientation.

This means the main experiment is well designed to test a particular question:

When clients possess accurate local graphs and interventions expose useful v-structures, can I-PERI exploit that information?

The answer is broadly yes.

It does not answer a more ambitious question:

In an arbitrary network of real organizations with undocumented heterogeneity, how often will useful intervention-revealed v-structures exist?

That remains unknown.

The reported gains are meaningful, but not universal

At (p=3) variables, I-PERI reduces average SHD from PERI’s (3.16) to (1.53), while increasing F1 from (0.59) to (0.75).

At (p=8), it reduces SHD from (8.40) to (4.44) and raises F1 from (0.64) to (0.74).

Those results support the intended mechanism: the second phase can recover useful orientations that the observational CPDAG leaves unresolved.

The larger settings are less tidy.

At (p=10), FedDAG records a lower SHD than I-PERI—(9.04) against (9.85)—and a much higher F1 score, (0.75) against (0.58).

At (p=20), I-PERI has a lower SHD than PERI, (27.8) against (30.0), but PERI has the higher F1 score, (0.56) against (0.51).

The proper reading is not that I-PERI wins every metric in every setting. It is that the method usually improves structural recovery under an experimental design tailored to make its orientation-refinement mechanism useful, while retaining exceptions as graph size and estimation difficulty increase.

That is still a substantive result. It is simply less cinematic than “consistent dominance.”

More clients do not automatically rescue the graph

The main plots show I-PERI’s (\Phi)-CPDAG generally producing lower SHD and higher F1 than the phase-one CPDAG across changes in client count and per-client sample size.

The curves are comparatively stable. They do not suggest that adding clients or samples creates unlimited improvement. Additional clients help only when their local interventions reveal useful, accurately estimated structures. A larger federation can also contribute more noisy local graphs, more incompatible interventions, and more opportunities for orientation errors.

Federated scale is therefore not the same as causal information scale.

Runtime is a genuine operational advantage in the reported benchmark

I-PERI records the lowest average runtime in the paper’s computational comparison. The reported average is below PERI and orders of magnitude below some optimization-based baselines.

This is operationally relevant because federated causal discovery can otherwise become expensive through repeated communication, local optimization, and global graph search.

The runtime comparison remains benchmark-specific. It does not include deployment overhead such as schema alignment, privacy accounting, network latency, graph review, or repeated retraining across changing environments.

The algorithm may be fast. The organization around it will retain its traditional commitment to meetings.

The Appendix Tests Portability, Not a Second Thesis

The appendix extends the experiments in two directions: changing the client-level discovery method and introducing nonlinear synthetic data.

These tests are best interpreted as robustness and sensitivity checks.

Appendix test Likely purpose What it supports What it does not prove
Replace PC with GES for local graph discovery Sensitivity to the local causal-discovery algorithm I-PERI’s mechanism is not tied exclusively to PC Any local discovery algorithm will work equally well
Compare PC and GES directly Implementation-choice robustness The phase-two benefit can survive different local estimators Local estimator quality is unimportant
Use nonlinear synthetic functions Exploratory extension beyond linear structural equations I-PERI can operate when clients use a suitable nonlinear-capable discovery process General performance on arbitrary nonlinear real-world systems
Vary nonlinear graph size Scaling sensitivity Some structural gains persist as graph size changes Uniform dominance across metrics or sizes

For the nonlinear experiments, the authors generate data using randomly selected hyperbolic tangent, sine, and quadratic transformations, then discretize the continuous values into five bins. Clients use the PC algorithm with a chi-squared conditional-independence test.

The nonlinear results are mixed.

I-PERI records lower SHD than PERI for (p=3, 4, 10,) and (20), but slightly higher SHD at (p=8). Its F1 score is higher at (p=3) and (p=10), but lower at (p=4, 8,) and (20).

The appendix therefore supports a narrower conclusion than the main linear benchmark: the intervention-refinement mechanism can remain useful outside the linear setup, but its benefit depends heavily on the accuracy and behavior of the local discovery procedure.

That dependence is not a minor implementation detail. It is the bridge between the theorem and any future deployment.

Differential Privacy Has a Theorem, Not a Magic Cloak

I-PERI avoids sharing raw data and local causal graphs. Clients communicate regret values, and the server coordinates aggregation.

The paper then derives a sensitivity bound for the regret function. If each communicated regret has sensitivity at most (Q), adding independent Laplace noise with scale

$$ \lambda = \frac{Q}{\varepsilon} $$

provides (\varepsilon)-differential privacy.

Smaller (\varepsilon) means stronger privacy and more noise. More noise may degrade the server’s ability to compare candidate graphs and select reliable orientations.

That trade-off is standard, but the paper does not empirically measure it. There is no reported sweep showing how graph accuracy changes across privacy budgets.

The paper also makes two points that deserve more attention than privacy slogans usually permit.

First, sharing the global causal graph may itself reveal sensitive information. Encryption may be needed to protect it during transmission or storage.

Second, client graphs could in principle be reconstructed from shared regrets and the global graph. The reconstruction problem is NP-hard, which makes it computationally difficult. It does not make reconstruction impossible, nor does complexity theory replace threat modeling.

The authors appropriately note that their bottom-up privacy approach may not offer the same protection as cryptographic methods in adversarial settings.

The title’s “price of privacy” therefore remains partly unpriced. The theorem establishes how noise should be calibrated. The experiments do not show what accuracy the federation must surrender after paying the bill.

Business Value Begins With Classifying the Intervention

The paper directly demonstrates a causal-discovery mechanism under controlled synthetic conditions. Translating it into business practice requires a more careful sequence than “federate the data and find the causes.”

The first task is to determine what kind of heterogeneity exists.

A hospital changing a treatment threshold may create a parametric intervention. A factory disabling an automated approval dependency may create something closer to a structural intervention. A franchise recording customer complaints differently creates a measurement problem, not a useful experiment.

Only structural interventions that expose informative local v-structures can provide I-PERI’s additional orientation benefit.

Operational difference across clients Possible causal interpretation Expected benefit from phase two
Different pricing coefficient or decision threshold Parametric intervention Little or no additional orientation
Removal of an approval, dependency, or process input Potential structural intervention May expose new v-structures
Different variable definitions or logging standards Measurement incompatibility Likely harmful
Unrecorded common causes varying across sites Latent confounding Violates the intended setup
Selection rules that alter which cases enter the dataset Selection bias May invalidate local graphs

This classification step is less glamorous than graph learning. It is also where many deployments would succeed or fail.

A plausible deployment pathway

For a hospital network, distributed manufacturer, franchise system, or multi-entity services group, the practical pathway could look like this:

  1. Align the variables. Clients must measure the same set of variables with compatible meanings. Federating incompatible schemas merely distributes the confusion.

  2. Document operational changes. Even though I-PERI does not require intervention targets to be sent to the server, organizations still need enough local process knowledge to judge whether differences are plausibly structural, parametric, or measurement-related.

  3. Maintain an observational reference client. The theoretical recovery of the underlying CPDAG requires at least one client without structural interventions. In a continuously changing organization, identifying such a reference environment may be difficult.

  4. Validate local causal graphs before aggregation. Stability across resamples, agreement with domain knowledge, and sensitivity analyses become essential because local errors can become global orientations.

  5. Exchange privatized regret signals. The privacy budget, communication protocol, and protection of the resulting global graph require explicit governance.

  6. Use the (\Phi)-CPDAG as a decision-support object. It narrows the plausible causal structures. It does not automatically provide a fully identified DAG or authorize interventions without further validation.

The business value, if the method works, is not simply “better causal AI.” It is a more informative shared causal map produced without centralizing sensitive client data or disclosing local intervention targets.

That map could narrow root-cause investigations, identify which relationships deserve controlled experiments, and reduce the number of plausible explanations after an operational failure.

The paper does not measure those returns. They are a reasonable business inference, not an empirical result.

The Deployment Boundary Is Narrower Than the Federated Label Suggests

I-PERI improves an important but carefully defined setting. Several conditions materially limit how broadly its results can be applied.

The clients must use the same variables. At least one client must provide observational data. Useful structural interventions must occur. Local CPDAGs must be accurate enough for their orientations to be trusted. The theoretical analysis relies on causal sufficiency, faithfulness, and the absence of selection bias.

Real organizational data routinely violates several of these conditions at once.

A hospital network may contain latent socioeconomic confounders. A manufacturing federation may measure the same nominal variable differently across plants. A franchise system may change both its processes and its reporting conventions simultaneously. The resulting heterogeneity may be causally interesting, statistically destructive, or both.

The evidence is also entirely synthetic. The experiments deliberately create intervention patterns favorable to the second phase and filter for reliable local graph recovery. That is appropriate for demonstrating the proposed mechanism. It is not evidence that undocumented real-world policies will routinely produce the same gains.

Finally, differential privacy protects individual records under a formal neighboring-dataset definition. It does not by itself protect every form of institutional secrecy, defend against malicious participants, or prevent the global graph from revealing commercially sensitive structure.

These are not decorative cautions. They determine whether the method’s central insight can survive contact with an organization.

Heterogeneity Is Information Only When It Has Causal Geometry

I-PERI’s strongest contribution is not that it adds another federated causal-discovery algorithm to the inventory.

It formalizes what can be learned when clients differ because of unknown interventions, cannot pool their data, and do not disclose the intervention targets. Its first phase reconstructs the shared observational structure without punishing clients for edges their interventions removed. Its second phase uses intervention-exposed v-structures to orient edges that observational data leaves ambiguous.

The resulting (\Phi)-CPDAG occupies a useful middle ground: more informative than a standard observational CPDAG when the interventions reveal the right structures, but less informative than a graph learned with known intervention targets.

The experiments show that the mechanism works in a deliberately supportive synthetic environment, often improving structural recovery and doing so with low reported computational cost. The appendix shows that the benefit can survive changes in local discovery methods and nonlinear data, although not uniformly across every metric and graph size.

For businesses, the lesson is not to celebrate heterogeneity indiscriminately. It is to distinguish operational variation that behaves like a useful structural experiment from variation that merely corrupts comparability.

Federated learning often asks how to make distributed clients agree.

I-PERI asks the more productive question: what did their disagreement reveal before we averaged it away?

Cognaptus: Automate the Present, Incubate the Future.


  1. Federico Baldo and Charles K. Assaad. 2026. “Regret-Based Federated Causal Discovery with Unknown Interventions.” arXiv:2512.23626. https://arxiv.org/abs/2512.23626 ↩︎