Price Shock Therapy: Causal ML Reveals True Impact of Electricity Market Liberalization

TL;DR for operators

Electricity deregulation is usually sold as a simple story: introduce competition, lower prices, everyone applauds politely, preferably near a ribbon-cutting ceremony. The paper behind this article is more useful because it refuses that simplicity. It asks a sharper operational question: when independent electricity producers actually entered selected US state markets, did residential electricity prices fall relative to a credible counterfactual?¹

The authors estimate that early liberalization reduced residential electricity prices by about 0.795 cents per kilowatt-hour, or roughly 7% compared with the average price in the year before intervention, over the first two years after independent producers entered the market. That is the headline. It is not the whole story.

The more important lesson is methodological. The paper treats liberalization as a causal forecasting problem: build a synthetic version of the treated states’ future without liberalization, then compare that synthetic path with what happened. It tests four model families: DeepProbCP, TSMixer, Augmented Synthetic Control, and Causal ARIMA. DeepProbCP performs best for the paper’s real-world setting: many state-level series, short pre-intervention histories, and a short post-intervention window.

For business readers, the takeaway is not “deregulate everything and prices magically behave.” That is PowerPoint economics, and PowerPoint economics is where nuance goes to die. The takeaway is that causal ML can help regulators, utilities, investors, and market operators estimate the impact of policy and market-structure changes when controlled experiments are impossible.

The boundary is equally important. This is a short-term estimate for residential prices in a specific US liberalization window. It does not prove that liberalization always lowers prices, that the effect persists, or that the same outcome would hold under today’s renewables, grid-congestion, storage, or retail-choice conditions. It also depends on the counterfactual model being credible. In causal ML, the model is not decoration. It is the bridge over the canyon.

The real comparison is not deregulation versus monopoly

The obvious reading of the paper is “competition lowered electricity prices.” That is partly right, but too shallow.

The paper’s real contribution sits between two comparisons:

Comparison	What is being tested	Why it matters
Traditional policy evaluation vs causal ML	Whether counterfactual forecasting can improve on Difference-in-Differences-style analysis	Electricity prices are seasonal, regional, nonlinear, and not especially interested in obeying tidy assumptions
DeepProbCP vs other counterfactual models	Which model gives the most credible short-term untreated price path	The policy estimate is only as good as the artificial history used to judge it

Previous US electricity-liberalization studies often relied on Difference-in-Differences. DiD is useful, familiar, and sometimes perfectly adequate. It is also a bit like using a ruler to measure smoke when the treatment and control groups do not move in parallel. The paper argues that electricity prices may not satisfy linearity and parallel-trend assumptions cleanly enough for that to be the only tool in the room.

Synthetic-control-style methods shift the question. Instead of asking whether treated and control groups followed a parallel trend, they ask whether untreated units and covariates can be used to construct a plausible counterfactual for the treated units. In plain English: what would California, New York, Pennsylvania, and similar liberalized states have looked like if independent producers had not entered?

That is a more useful business question because operators rarely get clean randomized trials. No regulator randomly assigns half a grid to liberalization for scientific neatness. Energy markets change because of legislation, infrastructure, producer entry, fuel prices, technology adoption, and institutional pressure. The evidence is observational. The counterfactual has to be built.

The intervention date is where the paper earns its keep

One of the paper’s better choices is not algorithmic. It is historical.

The authors argue that prior research often treated the policy approval year as the intervention date. That sounds reasonable until one remembers that legislation is not a power plant. Passing a deregulation policy does not instantly create retail competition, private generation capacity, customer switching, or a functioning market structure.

So the paper uses a more operational trigger: the first visible spike in the share of electricity generated by individual producers. In other words, liberalization is counted when the market structure actually starts changing, not when the legal permission slip is framed and placed in an office corridor.

The paper identifies 17 states that passed deregulation policies between 1996 and 2000, then focuses on states where independent producer penetration jumped around 1998–1999. It sets 1990–1997 as the pre-intervention period and 1998–1999 as the short post-intervention window. The analysis is therefore deliberately short-term.

There is a small wrinkle. The text says “8 states” but lists California, Connecticut, Illinois, Maine, Maryland, New Jersey, New York, Pennsylvania, and Rhode Island—which is nine states, unless arithmetic has also been deregulated. This does not automatically invalidate the study, but it does matter for interpretation. A small treated group means details like inclusion criteria, state grouping, and timing deserve attention.

The intervention-timing decision is still valuable. For business analysis, this is often the difference between measuring a board decision and measuring operational reality. A tariff reform, grid investment, pricing policy, or new market rule may be announced in one year and bite much later. If the causal clock starts too early, the estimate becomes fog.

Four models enter; only one really fits the job

The paper compares four approaches. The comparison is not a beauty contest for model architecture. It is a test of which tool is appropriate for a short, state-level causal intervention problem.

Model	Role in the paper	Likely purpose of test	What the result suggests
DeepProbCP	Global LSTM-based causal forecasting model	Main model-selection candidate	Best fit for short time series and multiple treated units
TSMixer	MLP-based time-series forecasting model adapted for causal counterfactuals	Comparison with a strong modern forecasting architecture	Better with longer series, weaker in short intervention settings
Augmented Synthetic Control	Local synthetic-control method with ridge-style bias correction	Classical synthetic-control comparison	Weakest overall performance in this setup
Causal ARIMA	Econometric benchmark	Prior-work and baseline comparison	Better than ASCM when a local model is preferred, but not best overall

DeepProbCP wins because the empirical setting is awkward in exactly the way many real business settings are awkward: short histories, multiple units, seasonality, and a policy shock. The model learns globally across units, uses pre-intervention data to forecast post-intervention counterfactuals, and is designed for short-term intervention analysis.

TSMixer is more interesting than a simple loser. The authors note that it performs poorly on short series but improves with longer histories. This matters because “state-of-the-art forecasting model” does not automatically mean “best causal model.” Forecasting and causal counterfactual estimation are related, but they are not the same job. A model trained to forecast long-range temporal patterns may stumble when asked to reconstruct a short untreated future under intervention.

ASCM, despite its debiasing logic, performs poorly in this paper’s experiments. Causal ARIMA performs better than ASCM when a local model is needed. That is a useful reminder: old econometric baselines are not always glamorous, but they can still be stubbornly competent. Glamour rarely pays the electricity bill.

The synthetic experiments are model selection, not the policy result

The synthetic experiments are easy to skim past. That would be a mistake.

In the real-world electricity setting, the true counterfactual is unknowable. We do not observe what treated states’ prices would have been without liberalization. So the authors first construct synthetic datasets where the treatment effect is known by design. This lets them test whether each model can recover a counterfactual when the answer is visible to the researcher.

The synthetic data mimics energy-like seasonality using combinations of daily, weekly, and monthly patterns, with variants for stationary and trending data. The intervention effect is deliberately heterogeneous: higher quantiles receive larger post-intervention shocks. That is not a random detail. It tests whether the models can handle uneven treatment intensity rather than assuming the same effect everywhere.

The experiment varies two practical dimensions:

Dimension	Values tested	Why it matters
Number of series	50 and 300	State-level and firm-level policy datasets are often cross-sectional but not massive
Series length	90 and 420 observations	Many interventions have short usable histories before the regime changes again
Data behavior	Stationary and trending	Energy prices often combine seasonal and structural movement
Treatment pattern	Quantile-dependent effect	Real policy effects are rarely uniform across units

The results are clear enough for the paper’s purpose. DeepProbCP produces the lowest forecasting errors in all stationary cases and most trending cases. In the short-series settings most similar to the real electricity data, it is also closest to the true synthetic treatment effect. TSMixer improves as the series length grows. ASCM is generally weakest. Causal ARIMA is respectable, particularly relative to ASCM.

This is not an ablation in the narrow neural-network sense. The paper is not removing one component from DeepProbCP to prove that a layer or optimizer matters. It is a comparison and sensitivity exercise: which counterfactual estimator survives the kind of data environment the real policy question creates?

The real-world test asks whether control states can be predicted

Once the paper moves to real electricity prices, the true counterfactual disappears. The authors therefore evaluate model credibility indirectly.

For untreated control states, there is no liberalization intervention in the study window. Under the null-intervention assumption, a good counterfactual model should predict their post-intervention outcomes reasonably well. The paper evaluates this using sMAPE and MASE on control units only, then applies a Wilcoxon test as a placebo-style check comparing treated and control error distributions.

The real-world model comparison looks like this:

Model	Placebo-test p-value	sMAPE on controls	MASE on controls	Interpretation
TSMixer	0.003	0.052	2.191	Passes placebo check, but weaker forecasting fit
DeepProbCP	0.001	0.035	1.341	Best control-unit forecasting performance
ASCM	0.000	0.061	2.719	Passes placebo check, weakest forecasting fit
Causal ARIMA	0.001	0.041	1.852	Solid local/econometric baseline

All four models estimate a negative treatment effect. That convergence matters. It reduces the chance that the paper’s conclusion is merely an artifact of one model family.

But DeepProbCP is selected because it has the lowest control-unit forecasting errors. The logic is practical: if a model cannot predict untreated states well, why trust it to predict the unobserved untreated path of liberalized states? Causal inference is not just about sophisticated language. It is about earning the right to compare reality with a model-generated alternative.

The price effect is meaningful, not miraculous

The headline estimate is a 0.795 cents/kWh reduction in residential electricity prices, equivalent to roughly 7% relative to the prior-year average. The paper interprets this as evidence that liberalization and independent producer entry improved short-term price competitiveness for residential customers.

The other models point in the same direction:

Model	Estimated ATT
TSMixer	-0.859 cents/kWh
DeepProbCP	-0.795 cents/kWh
ASCM	-0.441 cents/kWh
Causal ARIMA	-1.064 cents/kWh

That range is instructive. The sign is stable, but the magnitude varies. DeepProbCP’s estimate is not chosen because it is the most dramatic; Causal ARIMA estimates a larger reduction. It is chosen because DeepProbCP performs best in the paper’s validation logic.

For operators, the magnitude should be read carefully. A 7% short-term residential price reduction is economically relevant, especially in a sector where small per-unit changes aggregate across millions of customers. But it is not a sweeping proof that every liberalization program produces consumer savings. Market design matters. Fuel costs matter. Grid constraints matter. Transition rules matter. Retail pricing rules matter. The boring institutional plumbing matters, as usual.

The paper’s estimate is best understood as evidence that early independent producer entry in selected US states coincided with and plausibly caused a short-term residential price reduction under the paper’s counterfactual framework. That is a strong claim. It is not a universal doctrine.

Causal ML beats DiD only when it improves the counterfactual

The paper positions causal ML as an improvement over traditional Difference-in-Differences because it avoids strict linearity and parallel-trend assumptions. That is fair, but not a free lunch.

Causal ML replaces one kind of assumption with another. Instead of assuming parallel trends, it assumes the counterfactual can be learned from pre-intervention patterns, control units, and covariates. It also assumes that included covariates capture enough of the relevant supply and demand forces to reduce confounding risk.

The authors include state-level average income to represent demand-side conditions and gas prices to represent supply-side pressure. That is sensible, but not exhaustive. Electricity prices can be influenced by regulation, capacity mix, transmission constraints, retail rate freezes, fuel hedging, nuclear availability, renewable mandates, weather, customer mix, and political compromise. The grid is not exactly famous for being simple.

So the correct comparison is not:

DiD is old; causal ML is new; new wins.

The better comparison is:

DiD makes strong trend assumptions; causal ML makes strong counterfactual-model assumptions; the more credible approach is the one whose assumptions better match the market structure and data.

In this paper, causal ML looks attractive because the treated and control states may have nonlinear, seasonal, and heterogeneous price dynamics. The model comparison also gives the authors a defensible reason to choose DeepProbCP rather than simply selecting the most fashionable architecture.

What the paper directly shows, and what business should infer

The business value of the paper is not that it gives executives a slogan about deregulation. It gives them a template for evaluating interventions where the counterfactual matters.

Layer	What the paper supports	What Cognaptus infers for operators	Boundary
Direct empirical result	Selected early US liberalization cases show an estimated short-term residential price drop	Competition can create consumer price pressure when market entry actually occurs	Not proof of universal or long-term price reduction
Methodological result	DeepProbCP performs best in short-series synthetic and real-data checks	Model selection should be matched to intervention geometry, not leaderboard prestige	Different data length or treatment design could favor another model
Policy-design result	Intervention timing based on producer entry is more operationally meaningful than policy passage	Measure when behaviour changes, not when paperwork changes	Requires reliable market-activity indicators
Risk-management result	Placebo/control-unit checks increase confidence	Counterfactual systems can support scenario testing before and after policy moves	Still observational, still assumption-dependent

For utilities, this kind of modelling could support tariff redesign, retail-choice evaluation, demand-response pricing, or infrastructure investment analysis. For regulators, it offers a way to ask whether a policy affected consumers after implementation rather than after announcement. For infrastructure investors, it can help estimate whether market-entry rules change revenue pools, consumer prices, and competitive pressure.

The crucial word is “support.” Causal ML should not become an oracle wrapped in Python. It should become part of the decision stack: domain assumptions, institutional knowledge, sensitivity testing, counterfactual modelling, and post-implementation monitoring.

The result is short-term by design

A major boundary is temporal. The paper studies the immediate short-term effect after independent producer entry. It truncates the data to 1990–1999, with 1998–1999 as the post-intervention period. The authors explicitly note that longer-term effects remain outside the scope.

That matters because electricity-market reforms often have staged effects. Initial competition may push prices down. Later, capacity constraints, fuel-price pass-through, network costs, market power, retail pricing structures, or regulatory redesign can change the picture. A short-term consumer gain can fade. A long-term efficiency gain can arrive late. A messy market can do both, because energy systems enjoy humiliating simple narratives.

The paper’s estimate should therefore be used as a diagnostic result, not a permanent verdict. It says: under the model and data used here, independent producer entry appears to have lowered residential prices in the first two years. It does not say: liberalization is a guaranteed durable consumer subsidy.

The appendix-style lesson: fit the tool to the intervention

The most reusable lesson is not “DeepProbCP always wins.” The reusable lesson is that the intervention geometry should drive model choice.

A policy team should ask:

Are there many treated units or only one?
Is the treatment date common or staggered?
Is the post-intervention question short-term or long-term?
Are there enough pre-intervention observations to train a modern forecasting model?
Are the control units truly unaffected?
Are key confounders observed, stable, and included?
Does the model perform well on untreated units?

In this paper’s setting, DeepProbCP fits because the study has multiple treated units, short time series, a single intervention window, and a need for global learning across units. TSMixer may become more attractive with longer series. Causal ARIMA remains useful when local interpretability or econometric familiarity is preferred. ASCM underperforms here, but that does not mean it is useless everywhere.

There is no universal causal model. There are only models whose assumptions are more or less embarrassing under inspection.

The business value is cheaper counterfactual discipline

Counterfactual discipline is expensive. It requires analysts to define what would have happened otherwise, then defend that imaginary world with evidence. Most organizations skip the hard part and compare before versus after, which is fast, intuitive, and frequently wrong.

This paper shows a more serious workflow:

define the intervention based on actual market behaviour;
build candidate counterfactual models;
test them on synthetic settings where the truth is known;
test control-unit forecasting performance in real data;
choose the model that best matches the intervention setting;
interpret the treatment effect with explicit boundaries.

That workflow is useful beyond electricity markets. It applies to pricing changes, subsidy programs, branch expansions, logistics redesigns, procurement reforms, AI workflow deployments, and market-entry decisions. Anywhere leaders want to know whether an intervention caused an observed shift, the question is not “what changed after launch?” The question is “what would have changed without launch?”

That is the expensive question. Causal ML does not make it cheap exactly. It makes it less evasive.

Where the conclusion should stay modest

The paper is valuable, but the strongest version of its conclusion is narrower than its most excited reader might prefer.

The evidence supports a short-term negative price effect for residential consumers in the selected US liberalization window. It also supports DeepProbCP as the best-performing model among the tested options for this short-series causal forecasting setup. It does not establish that liberalization works everywhere, that price reductions persist, or that all relevant confounding forces have been eliminated.

Three boundaries are worth keeping in view.

First, the study depends on observational identification. The control states are not randomly assigned, and the model must assume that untreated outcomes can be learned from available data.

Second, the covariate set is limited. Income and gas prices are sensible, but electricity pricing is shaped by a much wider institutional and physical system.

Third, the treated-state count inconsistency should be cleaned up in any future replication. A small inconsistency in text may not change the estimate, but in causal policy evaluation, clean unit definition is not administrative trivia. It is the spine of the design.

The sharper takeaway

The paper’s result is not that deregulation is good. The sharper takeaway is that market-entry timing, counterfactual modelling, and model selection can materially change how policy impact is measured.

That matters because energy markets are entering a period where interventions will multiply: grid modernization, storage incentives, renewable integration, dynamic tariffs, retail competition, demand flexibility, AI-driven load growth, and carbon constraints. Every intervention will come with a story. Many stories will be politically convenient. Some will even be true.

Causal ML will not remove judgment from energy policy. It will force judgment to become more explicit. That is progress. Not glamorous progress, perhaps, but electricity markets already contain enough drama without analysts adding theatrical certainty.

If policymakers and operators take one lesson from this paper, it should be this: do not ask whether prices changed after reform. Ask what prices would have done without reform, test the counterfactual machinery, and only then decide whether the policy deserves applause.

Cognaptus: Automate the Present, Incubate the Future.

Orr Shahar, Stefan Lessmann, and Daniel Traian Pele, “Causality analysis of electricity market liberalization on electricity price using novel Machine Learning methods,” arXiv:2507.12331, 2025, https://arxiv.org/abs/2507.12331. ↩︎

TL;DR for operators#

The real comparison is not deregulation versus monopoly#

The intervention date is where the paper earns its keep#

Four models enter; only one really fits the job#

The synthetic experiments are model selection, not the policy result#

The real-world test asks whether control states can be predicted#

The price effect is meaningful, not miraculous#

Causal ML beats DiD only when it improves the counterfactual#

What the paper directly shows, and what business should infer#

The result is short-term by design#

The appendix-style lesson: fit the tool to the intervention#

The business value is cheaper counterfactual discipline#

Where the conclusion should stay modest#

The sharper takeaway#