Causality in Stereo: How Multi-Band Granger Unveils Frequency-Specific Influence

TL;DR for operators

Signals do not always influence each other on one clock. A machine vibration may create a fast alarm signature and a slower thermal drift. A brain region may interact through one rhythm quickly and another rhythm slowly. A market signal may move through intraday noise, weekly positioning, and slower macro repricing. Treating all of that as one blended time series is convenient. It is also a rather efficient way to throw away the thing you wanted to understand.

The paper behind this article introduces Multi-Band Variable-Lag Granger Causality, or MB-VLGC, a framework that asks a sharper question: does one time series improve prediction of another inside specific frequency bands, while allowing the causal delay to vary over time?¹ The mechanism is simple in outline: split the signal into bands, run variable-lag Granger causality within each band, then combine the band-level evidence.

The strongest result is not that MB-VLGC wins every row. It does not. Traditional Granger causality and VLGC are better at rejecting pure random-noise false positives in the synthetic table. PCMCI+ performs extremely well on positive synthetic cases. The more important result is narrower and more useful: when the data contain multi-frequency causal lags—10, 40, and 80 Hz components with different delays—the proposed method reaches 0.933 accuracy, while VLGC scores 0.167, traditional Granger 0.267, transfer entropy 0.567, VLTE 0.533, PCMCI+ 0.900, and Granger-Geweke 0.133. That is the paper’s home turf.

For business use, the value is not mystical “AI causality.” It is better diagnostic separation. MB-VLGC can tell an analyst not merely that signal $X$ predicts signal $Y$, but that the relationship appears in a particular band with a particular lag pattern. That matters for EEG analytics, industrial monitoring, energy systems, behavioural telemetry, and market microstructure—anywhere the same relationship may travel through several clocks at once.

The main caveat is equally operational: this remains observational Granger-style causality. It means predictive precedence under a statistical model, not proof that pushing a button on $X$ will move $Y$. A board presentation that forgets this distinction has not discovered causality. It has discovered PowerPoint theatre.

The practical problem is not causality in general; it is causality with the wrong clock

Most business time-series questions start innocently.

Did campaign exposure move conversion? Did gas input drive CO₂ output? Did one electrode influence another during motor imagery? Did a price signal lead another price signal? The analyst reaches for a causal time-series method and soon encounters the old Granger idea: if the past of $X$ improves prediction of $Y$ beyond the past of $Y$ itself, then $X$ Granger-causes $Y$.

That definition is useful because it is testable. It is dangerous because it is easy to over-read. Granger causality is about predictive information over time. It does not, by itself, establish physical intervention, legal responsibility, or strategic control. Still, for many systems where randomised experiments are impossible, it remains a practical way to detect directional structure.

The paper’s complaint is not that Granger causality is useless. The complaint is that the default clock is too rigid.

Traditional Granger methods often assume a fixed lag. $X$ affects $Y$ after, say, five time steps. That is tidy. Many real systems are not tidy. Delay can change as conditions change. Variable-Lag Granger Causality, or VLGC, addresses this by using dynamic time warping to align cause and effect when the lag varies over time.

But VLGC still treats the signal as one broadband object. That is the second clock problem. A time series is often a stack of rhythms: slow cycles, mid-frequency oscillations, fast transients. The same pair of variables may interact differently across those rhythms. A single lag for the whole signal can become a compromise that fits nothing very well. Ah yes, the classic enterprise analytics strategy: average away the mechanism, then ask why the dashboard is vague.

MB-VLGC is designed for precisely that failure mode.

MB-VLGC separates the signal before it asks who predicts whom

The method’s core move is mechanical, not mystical. It turns one causality question into several band-specific causality questions.

At a high level, the pipeline has three stages:

Stage	What the method does	Operational meaning
Frequency banding	Decomposes each input time series into band-limited components using bandpass filters	Separates slow, medium, and fast signal structures before testing influence
Per-band VLGC	Runs variable-lag Granger causality independently within each band	Allows each frequency band to have its own delay pattern
Result integration	Combines band-specific evidence, mainly through Fisher’s combined probability test	Produces both an overall causality decision and frequency-specific diagnostics

This is a mechanism-first paper because the mechanism is the story. The contribution is not “we added a preprocessing step and the benchmark improved.” The real argument is that causality can be frequency-specific in delay, and therefore a broadband variable-lag test may be structurally under-specified.

The formal definition follows that intuition. Given time series $X$ and $Y$, and a set of frequency bands $B = {B_1, …, B_k}$, the paper says $X$ Multi-Band Variable-Lag Granger-causes $Y$ if there exists at least one band $B_i$ such that the band-limited component $X^{(B_i)}$ VL-Granger-causes $Y^{(B_i)}$.

That “exists at least one band” condition matters. It means a causal relationship does not need to dominate the whole signal to be detected. A narrow but real frequency-specific pathway can count.

The paper also motivates the method theoretically. Under stationary VAR assumptions, spectral Granger causality can be invariant under certain filtering operations. The authors argue that once stationarity and VAR assumptions are relaxed—exactly the territory where messy real signals tend to live—filtering can help separate frequency-specific Granger relationships. They then argue that, when different bands have different variable lags, a single broadband alignment can have larger residual variance than a model that aligns each band separately.

In plain language: if two frequency components travel with different delays, forcing one alignment path across the whole signal is a lossy compromise.

Dynamic time warping gives the model a moving delay

The per-band component relies on VLGC. The paper uses dynamic time warping, or DTW, to find an alignment path between $X$ and $Y$ so the delay can change across time rather than remaining fixed.

The variable-lag test compares three model families:

a null model using only the history of $Y$;
a fixed-lag model using the history of $Y$ and $X$;
a variable-lag model using the history of $Y$, $X$, and DTW-aligned $X$.

The method concludes that $X$ causes $Y$ under the VLGC criterion when the variable-lag residual variance is lower than both the null and fixed-lag alternatives. In the implementation described, the statistical decision uses a dual criterion: an F-test p-value threshold or a BIC-difference ratio threshold. The reported default settings are $\alpha = 0.01$ and $\gamma = 0.6$.

That is an implementation detail with practical consequences. The method is not just “run DTW and declare influence.” It is a regression comparison wrapped around dynamic alignment, then repeated across frequency bands.

The band decomposition step also has a technical constraint: temporal relationships must be preserved. The authors use zero-phase filtering through forward-backward filtering and fourth-order Butterworth bandpass filters. For an operator, the translation is simple: the preprocessing is not cosmetic. If filtering distorts timing, the causal lag story collapses before the model even starts.

The synthetic benchmark tests the method where ground truth exists

The paper’s main evidence comes from synthetic data, because synthetic data provide known ground truth. The authors generate 240 files: 120 with causality and 120 random-noise files without causality. The positive cases are split into four types:

Synthetic case	Likely purpose	What it tests
Basic causation	Main evidence / baseline comparison	Whether methods detect simple fixed-lag linear causality
Variable-lag causation	Main evidence / comparison with VLGC-style methods	Whether methods handle delays that change over time
Broadband causation	Main evidence / comparison under frequency-rich signals	Whether methods survive wide frequency content
Multi-frequency causation	Core mechanism test	Whether methods detect different lags at different frequencies
Random noise	False-positive control	Whether methods avoid hallucinating causality where none exists

The multi-frequency dataset is the decisive one. It contains components at 10, 40, and 80 Hz, with different causal lags of 15, 8, and 4 samples. This directly matches the paper’s thesis: different frequency bands can carry different delays.

The headline table reports the following accuracies and overall F1-scores:

Dataset / metric	MB-VL	VLGC	GC	TE	VLTE	PCMCI+	GG
Following relation	0.833	0.733	1.000	0.367	0.933	1.000	0.400
Variable-lag	0.767	0.600	0.233	0.333	0.867	1.000	0.467
Broadband lag	0.867	0.433	0.833	0.467	0.933	0.967	0.000
Multifrequency lag	0.933	0.167	0.267	0.567	0.533	0.900	0.133
Random noise	0.750	1.000	1.000	0.592	0.617	0.300	0.525
Overall F1-score	0.810	0.742	0.792	0.512	0.717	0.725	0.388

The result needs careful reading. MB-VLGC has the highest overall F1-score at 0.810, but it is not universally dominant. GC and PCMCI+ are perfect on the simple following-relation row. PCMCI+ is perfect on variable-lag and very strong on broadband. VLTE also performs strongly in several positive cases.

The stronger claim is more specific: MB-VLGC performs best where its mechanism should matter most. On multi-frequency lag, it scores 0.933, ahead of PCMCI+ at 0.900 and far ahead of VLGC, GC, TE, VLTE, and Granger-Geweke. This is the result that supports the paper’s central design choice.

The random-noise row is also important. MB-VLGC scores 0.750, while VLGC and GC score 1.000. That suggests the added flexibility may come with a false-positive trade-off. More expressive methods often find more structure. Some of it is useful. Some of it is the model being a little too eager to see patterns in static. We have met this colleague before.

Band selection is a sensitivity test, not a decorative parameter

The paper then varies the frequency-band configuration. This is best read as a sensitivity test: how much does performance depend on the analyst’s chosen bands?

Band configuration	Multi-frequency performance	Overall F1	Interpretation
Single band / broadband analysis	0.167	0.742	Collapses frequency-specific structure into one signal
Two bands	0.933	0.810	Best overall balance in the reported experiments
EEG-specific six bands	1.000	0.618	Excellent on multi-frequency cases, weaker overall due to over-segmentation

This table is quietly one of the most business-relevant parts of the paper. It says the method is not plug-and-forget. Band design changes the answer.

The two-band configuration, reported as 1–80 Hz and 81–120 Hz, gives the best overall F1-score in the experiments. The EEG-style six-band configuration reaches 1.000 on the multi-frequency dataset but falls to 0.618 overall. That is not a contradiction. It is the cost of granularity. More bands can expose specific mechanisms when the domain really is band-structured. More bands can also fragment simpler causal relationships until the evidence weakens.

For deployment, this creates a practical rule: use domain bands when domain knowledge is strong; use a conservative banding strategy when signal structure is uncertain; treat band configuration as a model choice to be validated, not a knob to be twiddled until the answer looks impressive.

Lag detection is a diagnostic extension, not just a yes-or-no result

The paper also reports frequency-specific lag detection on the multi-frequency causation dataset. This is not merely a benchmark score. It shows what MB-VLGC can provide beyond binary causality.

Band	True lag	Inferred lag	Significant lag	Lag error
Alpha	15	12.3 ± 6.0	11.6 ± 7.0	1.2 ± 0.4
Low gamma	8	4.3 ± 3.9	5.0 ± 0.0	3.0 ± 0.0
High gamma	4	2.4 ± 3.3	2.4 ± 3.3	2.5 ± 1.8

This table should be read as an exploratory diagnostic extension. It supports the idea that the method can recover different temporal patterns by band, but the estimates are not perfect. The low-gamma inferred lag is meaningfully below the true lag of 8. The high-gamma estimate is also below the true lag of 4. The paper states that high gamma is the most precise, while alpha shows more variability.

For operators, the takeaway is not “the method estimates lags exactly.” It is “the method can localise influence patterns by band well enough to guide investigation.” In a factory, that might mean separating fast vibration coupling from slower thermal effects. In EEG, it might mean locating which rhythm carries an interaction. In market data, it might mean distinguishing short-horizon reaction from slower inventory or positioning effects.

That diagnostic value is often more useful than a single causality flag.

Real-world validation is useful, but it is not intervention-grade proof

The paper tests four real-world datasets: Old Faithful geyser, chicken and egg prices, gas furnace, and EEG motor imagery. The reported detection table is simple:

Case	MB-VL	VLGC	GC	TE	VLTE	PCMCI	GG
EEG	1	0	0	1	1	1	0
Chicken and egg	1	1	1	1	1	1	1
Old Faithful	1	1	0	0	1	1	0
Gas furnace	1	1	1	1	1	1	1

This section is best interpreted as real-world validation and sanity checking, not as a controlled causal proof. The gas furnace example has a known directional story: gas consumption drives CO₂ output. The EEG example is particularly aligned with the paper’s thesis because neural rhythms are naturally frequency-structured. The paper reports that MB-VLGC detects causality between FC3 and FC5 electrodes where traditional Granger fails, and that only the gamma band shows bidirectional VL-Granger causation in the band-limited figure.

That is useful. It shows the method can reveal a frequency-localised relationship that broadband Granger misses.

But the table is also too coarse to carry the entire practical claim. A “1” means detection, not necessarily correct intervention semantics. In the chicken-and-egg and gas furnace rows, nearly all methods detect causality, so MB-VLGC is not adding much differentiation. In EEG and Old Faithful, it adds more. The business reader should not treat this as universal superiority across domains. Treat it as evidence that the mechanism remains plausible outside synthetic data.

What the paper directly shows, and what Cognaptus infers for business use

The paper directly shows three things.

First, MB-VLGC is formally defined as a multi-band extension of variable-lag Granger causality. It combines spectral decomposition with dynamic temporal alignment.

Second, on synthetic data with known causal structure, MB-VLGC achieves the highest reported overall F1-score among the tested methods and performs especially well on the multi-frequency lag case that matches its intended design.

Third, band configuration matters. The two-band setting performs best overall in the reported benchmark, while EEG-style bands perform best on the multi-frequency case but worse overall.

Cognaptus infers a more operational lesson: MB-VLGC is valuable when the analyst expects multi-timescale transmission. That includes systems where influence may appear as a fast transient, a medium oscillation, and a slow drift. In those settings, a broadband model can hide exactly the information needed for diagnosis.

Business setting	Why multi-band variable lag may matter	What it could improve
Industrial telemetry	Faults can propagate through vibration, pressure, temperature, and control-loop rhythms at different speeds	Earlier root-cause localisation and less generic alerting
EEG and neurotechnology	Different neural rhythms can carry different functional interactions	Better rhythm-specific connectivity analysis
Energy systems	Demand, generation, and grid disturbances may operate across short and long cycles	More interpretable influence mapping across operating regimes
Market microstructure	Order flow, liquidity, and price impact can unfold across fast and slower horizons	Cleaner separation of transient reaction from sustained lead-lag structure
Behavioural analytics	User response may include immediate reactions and delayed habit formation	More precise segmentation of response dynamics

The ROI relevance is not that MB-VLGC automatically improves forecasting. It might, but that is not the paper’s main claim. The nearer-term value is cheaper diagnosis: narrowing where influence appears, at what rhythm, and with what delay. That can reduce the search space for engineers, neuroscientists, analysts, or risk teams.

The method’s flexibility is also its deployment risk

The limitation is not a ceremonial “more research is needed.” Obviously more research is needed. Academic papers are basically structured prayers to future work.

The practical limitations are more concrete.

First, band selection is consequential. The paper’s own sensitivity results show that band configuration can change overall performance materially. A careless banding scheme can over-segment the signal or bury the mechanism.

Second, filtering must preserve timing. The method depends on temporal precedence, so preprocessing choices are not harmless. Phase distortion, edge effects, and poor frequency separation can contaminate the causal interpretation.

Third, Granger causality is not intervention causality. MB-VLGC detects whether past information in one signal improves prediction of another within frequency-specific, variable-lag settings. It does not prove that manipulating $X$ will change $Y$ in the real world.

Fourth, real-world validation lacks ground truth in several cases. The synthetic tests are stronger because the true causal structure is known. The real-world tests are useful but cannot fully establish correctness without intervention, controlled system knowledge, or stronger external validation.

Fifth, false positives deserve attention. On random noise, MB-VLGC is not as conservative as traditional GC or VLGC in the reported table. In regulated or high-cost decision environments, that matters. A method that finds more candidate relationships may accelerate discovery, but it also needs a disciplined validation pipeline.

The right operating model is therefore not “replace all causal analysis with MB-VLGC.” It is more sober: use MB-VLGC when frequency-specific delays are plausible, validate bands against domain knowledge, check robustness across configurations, and treat its outputs as causal hypotheses with timing structure.

The real contribution is causal diagnosis in stereo

The paper’s useful idea is easy to state: causality in time series may not live on a single channel. It may be stereo, or even surround sound. One frequency band may carry a slow influence; another may carry a fast one. A single broadband lag can flatten both into ambiguity.

MB-VLGC gives analysts a framework for separating those channels before asking the Granger question. Its strongest evidence appears exactly where that design should help: synthetic multi-frequency lag data. Its practical promise is not universal causal truth, but better structured diagnosis.

That is enough to make it interesting.

For organisations drowning in telemetry, EEG traces, price streams, or behavioural signals, the value is not another black-box score. It is a map: which band, which direction, which delay, and how much confidence. Still observational. Still parameter-sensitive. Still requiring domain judgment. But far less blind to the fact that complex systems do not always speak in one rhythm.

Cognaptus: Automate the Present, Incubate the Future.

Chakattrai Sookkongwaree, Tattep Lakmuang, and Chainarong Amornbunchornvej, “Multi-Band Variable-Lag Granger Causality: A Unified Framework for Causal Time Series Inference across Frequencies,” arXiv:2508.00658, 2025, https://arxiv.org/abs/2508.00658. ↩︎

TL;DR for operators#

The practical problem is not causality in general; it is causality with the wrong clock#

MB-VLGC separates the signal before it asks who predicts whom#

Dynamic time warping gives the model a moving delay#

The synthetic benchmark tests the method where ground truth exists#

Band selection is a sensitivity test, not a decorative parameter#

Lag detection is a diagnostic extension, not just a yes-or-no result#

Real-world validation is useful, but it is not intervention-grade proof#

What the paper directly shows, and what Cognaptus infers for business use#

The method’s flexibility is also its deployment risk#

The real contribution is causal diagnosis in stereo#