Kill the Correlation, Save the Grid: Why Energy Forecasting Needs Causality

Humidity looks harmless on a scatter plot. Actually, in this paper, it looks worse than harmless: it appears negatively correlated with electricity demand.

That is the kind of result a busy forecasting team might quietly accept. Add humidity as a feature, let the model figure it out, move on. The grid will not wait politely while everyone debates Pearl diagrams.

But the paper Causal Inference in Energy Demand Prediction argues that this is exactly where forecasting systems begin to lie with confidence.¹ Humidity is not simply “a weather feature.” It is connected to hour of day, temperature, ventilation needs, and daily activity. When those relationships are ignored, the model can mistake the fingerprint of the clock for the effect of the weather. Very elegant. Very dangerous. Very normal.

The central contribution is not that the authors add another feature to an energy-load model. The contribution is that they build a structural causal model for demand, split electricity consumption into routine activity, HVAC, and lighting components, and then use that structure to build a Bayesian forecasting model. The paper’s strongest business lesson is also its least decorative one: feature engineering is not enough when the features are causally tangled.

Humidity looks wrong because the clock is hiding inside it

The paper studies hourly electricity load from the Upper Great Plains East balancing authority, covering September 2023 to August 2025. Weather variables include temperature, humidity, wind speed, and solar radiation. Calendar variables include hour of day and month of year. To avoid mixing model error with weather-forecast error, the authors evaluate using observed historical weather data rather than forecast weather.

The first important puzzle is humidity. A naive look at the data gives:

$$ corr(\text{humidity}, \text{energy}) = -0.07 $$

On its face, that says higher humidity is associated with lower energy demand. That is awkward, because domain knowledge says humidity can increase ventilation and cooling demand, especially in hot weather. The model has not discovered a counterintuitive law of physics. It has discovered a confounder.

The missing character is hour of day.

Humidity in the dataset tends to peak around 6 a.m. That hour is also an off-peak period for routine activity demand. So when a model conditions on humidity without controlling for hour, it accidentally drags in the low-demand morning pattern. Two things are happening at once:

Path	What it says	Direction of effect
Humidity → HVAC needs → energy demand	Higher humidity can increase ventilation and cooling demand	Positive, especially in hot conditions
Humidity ← hour of day → routine activity → energy demand	High humidity often appears at low-activity morning hours	Negative in the raw association

The raw correlation combines both. The causal effect is not directly visible because the clock has slipped into the weather variable wearing a cheap disguise.

After adjusting for hour and temperature, the interpretation changes. In high-temperature conditions, above roughly 75°F in the authors’ analysis, humidity shows a positive relationship with energy demand. The paper reports that, in this high-temperature regime, changing humidity from low to high can explain up to about 300 MW of demand variation, while the conditioned demand standard deviation is 369 MW. That is not a rounding error. For grid operators and industrial consumers, that is the difference between “minor weather noise” and a material driver of load.

This is the first mechanism-first lesson: causality matters not because it sounds philosophically superior, but because an apparently small negative correlation can hide a large positive conditional effect.

The causal diagram is not decoration; it decides what the model is allowed to learn

The authors propose a directed acyclic graph linking calendar variables, weather variables, latent demand categories, and total energy demand. The observed variables are hour, month, temperature, humidity, wind speed, and solar radiation. The demand side is split into three non-overlapping conceptual components:

routine activity demand;
HVAC demand;
lighting demand.

This split is useful because energy demand is not one homogeneous thing. A factory schedule, an air conditioner, and artificial lighting can all raise load, but they respond to different causes. Hour of day affects routine activity. Temperature, humidity, and wind affect HVAC. Solar radiation affects lighting, while also affecting temperature. Month shapes seasonal weather and daylight conditions.

A pure forecasting model may learn that “temperature matters” or “hour matters.” The causal model asks a sharper question: through which mechanism does each variable matter?

That distinction matters operationally. Suppose demand rises on a summer afternoon. A black-box model may say the forecast is high because weather and time features jointly pushed it there. A causal model tries to separate the story:

Demand component	Main drivers in the paper’s model	Operational interpretation
Routine activity	Hourly and monthly cycles	People and business schedules
HVAC	Temperature, humidity, wind	Cooling, heating, ventilation pressure
Lighting	Solar radiation and active hours	Artificial light demand under insufficient natural light

This is where the paper’s structure becomes more than an academic preference. A forecast that decomposes demand into causal components is easier to stress-test. If temperature forecasts shift, HVAC demand should move. If solar radiation changes during active hours, lighting demand should move. If the same temperature appears at a different hour, the routine activity channel should not be silently reassigned to temperature.

That is the kind of distinction ordinary correlation models often blur. They can still forecast well inside familiar data distributions. The problem arrives when the world changes: heat waves, altered work schedules, changing peak periods, abnormal weather, new electrification patterns. The model that learned “what usually happens together” may fail when the usual pairing breaks.

Temperature is not one effect when hour-of-day is missing

The temperature section gives the paper’s clearest numeric demonstration of confounding damage.

Electricity demand has a nonlinear relationship with temperature. Demand tends to rise in both hot and cold extremes, while falling near a comfort zone. The authors transform raw temperature into distance from a midpoint:

$$ T_{\text{transformed}} = |T - T_{\text{midpoint}}| $$

They choose 56°F as the midpoint that best fits the observed data. Then they allow temperature sensitivity to vary by month, reflecting the fact that the same temperature deviation does not mean the same thing in every season.

The key test compares two approaches.

Test	Likely purpose	What it supports	What it does not prove
Approach 1: regress demand on transformed temperature, then model residual daily cycle	Main confounding demonstration	Ignoring hour when estimating temperature effect biases attribution	That every non-causal model will fail equally
Approach 2: regress demand on transformed temperature and daily-cycle harmonics together	Main causal comparison	Controlling hour reduces confounding in the temperature estimate	That the proposed DAG is complete for all regions
Appendix proof using backdoor criterion	Formal justification	Under the assumed DAG, conditioning on hour identifies the relevant causal coefficient	That the DAG itself is empirically exhaustive
Frisch–Waugh–Lovell discussion	Implementation clarification	In linear settings, residualization can recover similar estimates	That causal reasoning is unnecessary in nonlinear settings

The result is large: the non-causal approach produces a 47.8% deviation in estimated temperature coefficients compared with the causal approach, and 12.5% worse out-of-sample MAPE on average.

The mechanism is intuitive once stated. High temperatures are more likely around midday, when routine activity is also high. Low temperatures are more likely after midnight, when activity demand is low. If hour is not controlled, temperature gets credit for part of the daily activity pattern. In summer, cooling demand and activity demand can peak together around midday, amplifying variance. In winter, heating demand may peak when routine activity is lower, reducing aggregate variance.

This is a useful correction to a common forecasting belief. The problem is not merely that a model omitted a useful feature. The deeper problem is that omitting the confounder corrupts the estimated effect of a feature that was included.

That is a much nastier failure mode. A model can look feature-rich and still learn the wrong story.

Solar radiation and wind behave differently once temperature enters the room

The paper then extends the same causal logic to solar radiation and wind.

Solar radiation looks like it should reduce lighting demand: more sunlight, less artificial light. But in summer afternoon data, more solar radiation can appear to increase demand. The explanation is not that sunlight causes people to switch on lamps out of spite. Solar radiation warms the air, higher temperature increases cooling demand, and the temperature channel can dominate the lighting channel.

In winter, the direction changes again. More sunlight can warm the air and reduce heating needs.

Wind has a similar conditional structure. In hot conditions, stronger wind can reduce cooling needs. In cold conditions, it can increase heating needs by accelerating heat loss. The same variable points in different directions depending on the temperature regime.

These sections are not the main predictive evidence of the paper. They are mechanism checks. They show why the authors encode weather variables as interacting causal drivers rather than independent flat features. In business terms, they support a useful modeling discipline: before adding “weather” to a forecast, decide whether the weather variable is a direct demand driver, an indirect driver through another weather variable, or both.

The Bayesian model turns the causal diagram into a forecasting machine

After building the causal interpretation, the paper implements a Bayesian model in Pyro. The model encodes the DAG as a data-generating process: hour and month are root calendar variables; temperature, humidity, solar radiation, and wind are sampled according to functional forms shaped by calendar and weather relationships; energy demand is assembled from HVAC, lighting, routine activity, and yearly-cycle components.

This is not “Bayesian” as a decorative label. The paper uses priors to encode causal insights and then updates parameter beliefs using observed data. Inference is performed with stochastic variational inference, using an AutoNormal guide and Adam optimizer with a learning rate of 0.01.

The final energy demand equation conceptually sums the components:

$$ E = E_{\text{base}} + E_{\text{humid}} + E_{\text{wind}} + E_{\text{light}} + E_{\text{daily}} + E_{\text{yearly}} $$

The important part is not the formula’s elegance. It is the accounting logic. Instead of asking one flexible model to absorb everything into one opaque relationship, the authors ask the model to represent different demand mechanisms separately.

The performance numbers are strong on the stated task:

Evidence	Result	Interpretation
Training period	September 2023 to August 2024	Model fitted on one year of data
Test period	September 2024 to August 2025	Out-of-sample test on the following year
Training MAPE	3.23%	In-sample fit
Test MAPE	3.84%	Out-of-sample predictive performance
5-fold cross-validation MAPE	3.88% average	Robustness check across two years of data

The 5-fold cross-validation is best read as a robustness test, not as a second thesis. It supports the claim that the model’s performance is not simply a lucky split between the first and second year. It does not prove universal superiority across regions, weather data sources, or operating conditions.

The authors call the performance state-of-the-art. The article reader should treat that as an author claim unless accompanied by a broad benchmark table against strong modern forecasting baselines. The safer interpretation is still valuable: on this dataset, with observed weather data and the specified causal structure, the Bayesian causal model achieved low MAPE and explained seasonal variance patterns that a purely data-driven model might only mimic.

“Only mimic” is not an insult. In forecasting, mimicry often pays the bills. But when planning and intervention enter the picture, mimicry becomes less satisfying.

The business value is diagnosis, not just lower MAPE

The obvious business takeaway is “better forecasts.” That is true, but too small.

A 3.84% test MAPE matters. Grid operators care about error because load forecasting affects dispatch planning, reserve management, procurement, and congestion risk. Industrial energy consumers care because demand forecasts influence procurement decisions, peak-load management, and operational scheduling. Energy service providers care because forecast quality affects customer recommendations and contract performance.

But the paper’s deeper business value is diagnostic. A causal forecast can tell a more useful story about why load moved.

Paper result	Direct meaning	Cognaptus business inference	Boundary
Humidity appears negatively correlated before adjustment	Raw association is misleading	Forecast governance should inspect confounders before accepting feature effects	Shown on WAUE data, not all grids
Adjusted humidity matters in hot conditions	Humidity can explain material load variation during heat	Heat-event planning should include humidity, not only temperature	Uses observed weather data
Temperature coefficient bias reaches 47.8% without hour adjustment	Attribution error can be large	Model monitoring should track causal attribution drift, not only forecast error	Based on the paper’s two-regression comparison
Non-causal approach has 12.5% worse out-of-sample MAPE	Confounding hurts prediction, not only explanation	Causal structure can have ROI through lower forecast error	Not a universal benchmark against all ML models
Bayesian causal model reaches 3.84% test MAPE	Strong task performance	Interpretable causal priors can coexist with competitive forecasting	Region and feature set are limited

For a business user, the practical question is not “Should every energy model become a Pearl diagram with a Pyro backend?” That would be a fine way to turn methodology into procurement theater. The better question is: where are wrong attributions expensive?

Wrong attribution is expensive when forecasts feed decisions, not just dashboards. If a model incorrectly attributes midday load to temperature rather than activity, a planner may overestimate future temperature sensitivity. If humidity is dismissed because its raw correlation is negative, heat-event demand may be understated. If solar radiation is treated as a simple lighting proxy, summer cooling effects may be missed.

This is where causality becomes an operational control layer. It tells the forecasting system which associations are allowed to carry meaning.

What the paper shows, what we infer, and what remains uncertain

The paper directly shows that, in its WAUE dataset, causal adjustment changes interpretation materially. It directly shows a humidity sign puzzle that resolves after controlling for relevant variables. It directly shows a temperature attribution comparison where ignoring hour-of-day produces large coefficient deviation and worse out-of-sample MAPE. It directly reports strong predictive performance from a Bayesian causal model.

Cognaptus infers that similar causal audits could help organizations that rely on energy-load forecasts for decisions under changing conditions. The inference is strongest where domain mechanisms are known, variables are interdependent, and distribution shifts are plausible. Energy is a natural home for this approach because weather, time, and human activity are visibly entangled. The grid is basically a causal graph with invoices.

What remains uncertain is broader generalization. The study uses one balancing authority. Weather is represented by one location chosen as representative of the region. The authors deliberately use observed historical weather, so the reported performance does not include real weather-forecast uncertainty. Weekday and workday effects are not modeled, though the authors note they may matter. Autoregressive demand dynamics are also not explicitly included, even though electricity load is temporally continuous.

Those boundaries do not weaken the paper’s central lesson. They specify where the current evidence stops.

The missing features are not just more columns

The future-directions section is unusually important because it shows the authors understand the trap of casual feature addition.

Adding day-of-week sounds straightforward. Workdays and weekends likely differ. Adding lagged demand also sounds obvious, because electricity consumption is temporally dependent. When demand is unusually high now, it often remains elevated in the next period.

But lagged demand is not an innocent feature. It may improve predictive accuracy while obscuring the causal pathways that the model is supposed to preserve. If yesterday’s load already contains the effects of weather, activity, and operational conditions, blindly adding it can absorb causal structure into an autoregressive shortcut. The model may forecast better while explaining less.

This is the adult version of feature engineering: every new column has a job description and a reporting line. Some variables are causes. Some are proxies. Some are consequences. Some are colliders waiting to cause trouble at the next model review meeting.

Causal forecasting is a governance problem disguised as a modeling problem

The paper is useful because it does not ask readers to believe in causality as a slogan. It gives two concrete failures.

First, humidity looks negatively correlated with demand until hour and temperature reveal the hidden structure. Second, temperature sensitivity is badly misestimated when hour-of-day is handled in the wrong sequence. Both examples convert causal language into forecast risk.

That is the practical point. Correlation-based forecasting can work, especially in stable environments. But energy systems are not guaranteed to stay stable. Electrification, climate volatility, work-pattern changes, distributed energy resources, and demand-response programs all make historical associations less comfortable. When the environment shifts, models that learned accidental pairings may continue producing numbers with excellent formatting and poor judgment.

The paper’s Bayesian model is one answer. The broader lesson is even simpler: before trusting a forecast, ask whether the model knows the difference between a driver, a proxy, and a coincidence.

The grid does not need more confident correlations. It needs forecasts that can survive when the correlations stop behaving.

Cognaptus: Automate the Present, Incubate the Future.

Chutian Ma, Grigorii Pomazkin, Giancinto Paolo Saggese, and Paul Smith, “Causal Inference in Energy Demand Prediction,” arXiv:2512.11653, 2025. https://arxiv.org/abs/2512.11653 ↩︎

Humidity looks wrong because the clock is hiding inside it#

The causal diagram is not decoration; it decides what the model is allowed to learn#

Temperature is not one effect when hour-of-day is missing#

Solar radiation and wind behave differently once temperature enters the room#

The Bayesian model turns the causal diagram into a forecasting machine#

The business value is diagnosis, not just lower MAPE#

What the paper shows, what we infer, and what remains uncertain#

The missing features are not just more columns#

Causal forecasting is a governance problem disguised as a modeling problem#