TL;DR for operators
Volatility is usually treated as a risk input: measure it, size positions around it, and try not to get mugged by it before lunch. This paper treats volatility differently. It uses mid-range volatility to select stocks that are neither comatose nor explosive, then applies a causal-inference stack to find which stocks appear to move before others.
The proposed Vol-TS pipeline first clusters equities by OHLC-based historical volatility, then filters potential lead-lag relationships using Granger Causality, a customised PCMCI test, and Effective Transfer Entropy. Finally, it uses DTW-KNN to estimate the delay between the leading and lagging asset and converts that delay into a directional trend-following trade.
The headline result is attractive: across three selected pairs—MU→QCOM, META→TSLA, and TSLA→AMZN—the combined strategy reports a 15.38% return over a 45-day backtest, versus 3.65% for a weighted buy-and-hold benchmark on the three traded target stocks.1 The paper’s abstract reports a different comparative buy-and-hold figure, but the results section’s 3.65% comparison is the cleaner operational benchmark because it maps directly to QCOM, TSLA, and AMZN.
The practical lesson is not “causality has solved trading”, because markets remain rude and uncooperative. The lesson is more useful: a quant team can treat causal inference as a signal-filtering discipline. Start with a volatility regime, screen directional dependencies, reject weaker links, estimate execution lag, and then test whether the signal survives transaction costs, drawdowns, and regime changes. That is less glamorous than a miracle model. It is also less embarrassing.
The trading problem is not volatility. It is undisciplined volatility.
Markets are noisy rooms. Every asset appears to be reacting to every other asset, and after enough chart staring, even random motion begins to look like a conspiracy with a moving average.
The paper’s starting point is sensible: do not ask every stock to explain every other stock. First decide which stocks live in a comparable volatility regime. Vol-TS does this using daily OHLC data and historical volatility estimators that capture more than closing-price variance. Instead of reducing price movement to one closing-price return series, the framework uses the daily open, high, low, and close to estimate the texture of price movement.
That matters because volatility is not just “big movement”. A stock can drift smoothly, gap violently, churn intraday, or compress before breaking out. Treating all of those as one generic risk score is convenient, but convenience is where weak strategies go to retire.
The paper computes multiple historical volatility estimators, including Parkinson, Garman-Klass, Rogers-Satchell, and Yang-Zhang-style measures. It then uses a Gaussian Mixture Model to cluster stocks into low-, medium-, and high-volatility groups. The chosen trading universe is the medium-volatility cluster: TSLA, AMZN, META, MU, QCOM, and INTC.
That choice is the first operational filter. Low volatility may offer too little exploitable movement. High volatility may offer signal, but wrapped in enough chaos to make execution ugly. Medium volatility is the Goldilocks zone: enough motion to trade, not so much that every signal looks like a weather event.
This does not prove alpha. It simply reduces the search space before the causal machinery starts. That is already useful. Many trading systems fail not because the final model is weak, but because the candidate universe is a landfill with tickers.
Vol-TS is a narrowing machine, not a magic alpha engine
The Vol-TS pipeline works because each stage has a different job. The paper is easiest to understand as a sequence of filters, not as a single model.
| Stage | Likely purpose in the paper | What it supports | What it does not prove |
|---|---|---|---|
| KNN anomaly filtering | Implementation detail and data hygiene | Removes extreme structural disruption, especially the March 2020 crash period | That the later strategy is robust to crises |
| GMM volatility clustering | Main mechanism for asset selection | Narrows the universe to medium-volatility candidates | That clustered stocks have tradable relationships |
| Granger Causality Test | Initial directional screening | Finds pairs where past values of one series improve prediction of another | Structural causation or stable future alpha |
| Customised PCMCI | Main causal-graph refinement | Filters weaker or conditionally explainable links | Full causal identification under market interventions |
| Effective Transfer Entropy | Directionality confirmation | Tests whether information flow appears directional | That the relationship survives new regimes |
| DTW-KNN lag estimation | Signal-timing implementation | Chooses execution lag for trading the follower asset | That the lag is stable out of sample |
| 45-day backtest | Main performance evidence | Tests whether selected signals translated into profit in one short period | Long-run tradability after realistic market frictions |
The first important point: this is not a pure machine-learning forecast model. It is a structured discovery workflow. GMM selects the type of asset. Granger screens possible directional influence. PCMCI tries to remove conditional noise. Transfer Entropy checks directional information flow. DTW-KNN estimates the lag that converts “X seems to lead Y” into “trade Y after X moves”.
The second point: the word “causal” should be handled carefully. In financial markets, “causal” often arrives wearing a lab coat two sizes too large. Here, causality means predictive directional dependence under statistical tests. It does not mean MU causes QCOM to move in the way gravity causes dropped phones to meet pavement. It means that, in the observed sample and after the paper’s filtering process, MU’s past movement carried useful information about QCOM’s later movement.
That distinction is not pedantry. It is the difference between a trading signal and a theory of the semiconductor supply chain.
The causal stack turns a watchlist into three tradable arrows
After clustering, the paper applies the Granger Causality Test to all pairs inside the medium-volatility group. It scans lag windows from 2 to 48 days and identifies 45 days as the lookback window where the candidate relationships show the strongest statistical significance.
At that 45-day window, five pairs pass the p-value screen below 0.01:
| Candidate pair from GCT | p-value |
|---|---|
| AMZN–MU | 0.0001 |
| AMZN–TSLA | 0.0030 |
| META–TSLA | 0.0063 |
| MU–TSLA | 0.0001 |
| QCOM–MU | 0.0007 |
This is screening evidence, not final evidence. Granger testing is good at asking a narrow question: does the past of one time series improve prediction of another, beyond the target’s own past? It is not good at proving deep causal structure in a market where macro factors, sector rotation, news cycles, liquidity, and investor positioning all overlap like a badly managed group chat.
That is why the second filter matters. The customised PCMCI test builds a conditional dependency graph using partial correlations. The paper sets a threshold of 0.15 and keeps stronger conditional links. This step removes some GCT relationships, including MU–TSLA, while retaining several directed relationships with meaningful partial correlations: META→TSLA at 0.337, TSLA→AMZN at 0.307, and MU→QCOM at 0.288.
Then Effective Transfer Entropy is used to confirm the direction of information flow. The final selected trading pairs are:
| Leader | Follower traded | Optimal lag | DTW-KNN directional accuracy |
|---|---|---|---|
| MU | QCOM | 2 days | 92.3% |
| META | TSLA | 1 day | 98.1% |
| TSLA | AMZN | 5 days | 96.5% |
These numbers are strong. Possibly too strong. High directional classification accuracy in a compact sample is useful evidence for the paper’s proof-of-concept, but it should trigger operator curiosity rather than immediate budget approval. A signal that looks that clean over a short window needs to be tortured politely across other periods, sectors, liquidity regimes, and cost assumptions.
Still, the mechanism is coherent. The paper is not asking a model to predict all prices directly. It is asking a narrower question: among stocks with similar mid-range volatility, are there directional lead-lag relationships strong enough to be traded with specific delays?
That is a better question. Better questions are underrated in finance, mostly because worse questions come with prettier dashboards.
The backtest is impressive because it is narrow
The trading strategy is tested from 8 June 2023 to 12 August 2023. The setup is simple: $1,000 initial capital per target stock, $3,000 total, fixed commission of $9 per trade, and compounding enabled. The lagging stock is traded based on the trend of the leading stock after the identified delay.
The individual results are the paper’s clearest evidence:
| Strategy | Trades | Win rate | Return | Final equity | Sharpe | Sortino | Max drawdown |
|---|---|---|---|---|---|---|---|
| MU→QCOM | 4 | 100.0% | 15.12% | $1,151.20 | 2.169 | 3.933 | 2.66% |
| QCOM buy-and-hold | 1 | 100.0% | 2.59% | $1,025.94 | 0.24 | 0.31 | 6.81% |
| META→TSLA | 8 | 88.6% | 14.51% | $1,145.05 | 1.178 | 4.642 | 1.77% |
| TSLA buy-and-hold | 1 | 100.0% | 3.32% | $1,033.24 | 0.21 | 0.28 | 11.23% |
| TSLA→AMZN | 4 | 100.0% | 16.50% | $1,165.01 | 1.178 | 4.642 | 2.49% |
| AMZN buy-and-hold | 1 | 100.0% | 5.04% | $1,050.40 | 0.53 | 0.76 | 4.95% |
The portfolio result combines the three strategies. From $3,000 initial capital, the framework ends with $3,461.26, producing $461.26 in profit and a 15.38% return. The weighted buy-and-hold comparison on QCOM, TSLA, and AMZN returns 3.65% over the same period.
The drawdown profile is also important. The strategy’s maximum drawdowns stay below 3% across the three pairs, while target-stock buy-and-hold drawdowns reach as high as 11.23% for TSLA. That gives the result more substance than a return-only backtest. Anyone can produce a large return over a short window by accidentally loading up on the right risk. Lower drawdown suggests the directional signals may have helped with timing, not just exposure.
But the word “may” is doing work. Forty-five days is not long enough to establish durable market behaviour. Four trades with a 100% win rate is not a law; it is a promising small sample wearing very shiny shoes.
The business value is workflow discipline, not the 15.38%
For a quant desk, wealth platform, or trading-tool vendor, the paper’s commercial value is not “copy these three pairs and enjoy civilization”. That would be adorable. The value is the pipeline.
The Vol-TS workflow offers a repeatable structure for turning raw market data into tradable hypotheses:
- Select assets by volatility regime. Do not search everywhere at once.
- Screen directional relationships. Use Granger-style tests to find possible lead-lag candidates.
- Condition away weaker links. Use graph refinement to reduce spurious pair selection.
- Confirm information direction. Use entropy-based methods to test whether the leader adds uncertainty reduction.
- Estimate execution lag. Convert relationship discovery into trade timing.
- Backtest with risk metrics. Evaluate not only return, but drawdown, Sharpe, Sortino, and win rate.
This sequence is valuable because it separates discovery from execution. Many trading systems blend the two so tightly that nobody can tell whether performance comes from the signal, the sizing, the benchmark choice, or one extremely lucky Tuesday. Vol-TS makes the chain of reasoning more inspectable.
For business use, that inspectability has three benefits.
First, it supports governance. A trading committee can examine why MU→QCOM was selected, why the lag was two days, and what happened when weaker candidates were filtered out. That is better than asking a black-box model why it suddenly loves Qualcomm and receiving a probability score with the emotional warmth of a parking meter.
Second, it supports modular improvement. A team can replace the volatility estimator, modify the causal screen, test nonlinear causal discovery, improve the transaction-cost model, or expand the asset universe without rebuilding the entire system.
Third, it supports productisation. A wealth-tech or quant-research platform could turn this into a signal-discovery module: identify volatility regimes, surface candidate lead-lag relationships, show evidence strength, and allow analysts to validate before deploying.
The business implication is not automatic trading. It is cheaper diagnosis. The framework helps teams ask, “Which assets appear to listen to which other assets, with what delay, and under what volatility conditions?” That is a practical question.
“Causal” means predictive information flow, not market destiny
The likely reader mistake is to treat the paper’s causal language as if it proves durable economic causation. It does not, and the paper itself is more modest when read closely.
A Granger-causal relationship can disappear when volatility regimes change. A PCMCI-derived edge can be affected by missing variables. Transfer Entropy can suggest directional information flow without telling us the economic reason. DTW-KNN can find a useful lag in the sample without guaranteeing that the lag will remain stable after the next earnings season, rate shock, AI bubble, supply-chain panic, or whatever markets decide to cosplay next.
A better interpretation is this:
| Reader belief | Correction | Why it matters |
|---|---|---|
| “The model proves MU causes QCOM.” | It shows MU’s past movements helped predict QCOM’s later movements in the tested sample. | Trade signals need monitoring, not reverence. |
| “The 15.38% return proves the framework is production-ready.” | It proves the selected pipeline worked in one short backtest. | Deployment requires broader validation and cost modelling. |
| “High win rates mean low risk.” | High win rates over few trades can be fragile. | Risk must be tested across regimes and larger samples. |
| “Volatility clustering is just preprocessing.” | It is a strategic universe-selection step. | Bad universe selection can bury good causal tests. |
| “This competes directly with pairs trading.” | It is directional lead-lag trading, not classic market-neutral convergence trading. | Benchmarking and risk controls should match the strategy logic. |
This distinction makes the paper more useful, not less. Structural causality in markets is notoriously hard because markets adapt. Predictive causality is still valuable if it can be monitored, stress-tested, and retired when it decays. A trading desk does not need metaphysical certainty. It needs a signal with a known construction process and a kill switch.
Where operators should be sceptical
The paper’s limitations are not cosmetic. They directly affect practical interpretation.
The first boundary is sample size. The initial universe is small, and the final strategy trades only three relationships. A narrow universe can make a pipeline look cleaner than it will look across hundreds of assets, multiple sectors, and different liquidity profiles. Small universes also increase the risk that selected relationships reflect the particular market environment rather than a generally exploitable structure.
The second boundary is time. The backtest covers 45 days. That is suitable for a proof-of-concept, not a production claim. The paper acknowledges this and calls for longer validation across different regimes. For an operator, the next test is obvious: rolling-window backtests across bull, bear, high-rate, low-rate, crisis, and post-crisis periods. The signal should be evaluated not only by average return, but by decay speed, turnover, capacity, and correlation with known risk factors.
The third boundary is market friction. The backtest includes a fixed $9 commission per trade, but does not fully model slippage, bid-ask spreads, borrow constraints for short exposure, market impact, or execution timing. On large liquid US equities, these costs may not destroy the signal, but assuming they are harmless is how backtests become fiction with decimal places.
The fourth boundary is reporting clarity. The paper’s abstract reports a comparative buy-and-hold return of 10.39%, while the results section reports 3.65% as the weighted average buy-and-hold return on the three target stocks. For this article, the 3.65% figure is the better operational comparison because it aligns with the actual traded targets. Still, the discrepancy is worth noting because benchmark definition matters. A vague benchmark can make even a good strategy look better or worse than it is.
The fifth boundary is causal interpretation. The paper suggests possible mechanisms such as sentiment spillovers, supply-chain relationships, and information diffusion. These are plausible, but they are not directly tested. A stronger commercial version of the framework would combine price-based causal filtering with event data, sector exposure, earnings calendars, news sentiment, options flow, and liquidity variables. Price alone can reveal patterns; it rarely explains them politely.
What Cognaptus would test before using this in production
A production-grade version of Vol-TS would need more than a longer backtest. It would need a validation protocol.
The first test is rolling recalibration. Recompute clusters, causal graphs, and lags over moving windows. If the selected relationships change constantly, the system may still be useful, but only as a short-horizon signal scanner. If relationships persist, the case for deployment strengthens.
The second test is placebo comparison. Run the same pipeline on randomly paired stocks, shuffled returns, and sector-neutral alternatives. A causal pipeline should beat naive randomness not only in return, but in stability and drawdown control.
The third test is cost realism. Add spread, slippage, partial fills, market impact assumptions, and conservative execution delays. A 1-day lag signal can be highly sensitive to implementation details. Markets do not pause because the backtest is compiling.
The fourth test is capacity. A signal that works with $3,000 may not scale linearly. For large desks, capacity is not an afterthought; it is the difference between alpha and a charming spreadsheet.
The fifth test is failure analysis. When a selected lead-lag pair stops working, the system should identify whether the volatility regime changed, the causal edge weakened, the lag shifted, or the trade execution became too expensive. The ability to diagnose failure is often more valuable than another decimal point of reported Sharpe.
The real lesson: trade the relationship, not the noise
This paper is at its strongest when read as a mechanism for disciplined signal discovery. It does not claim that volatility alone predicts markets. It does not simply throw a neural network at price data and hope the GPU has good instincts. It builds a chain: volatility regime, causal screening, graph refinement, information-flow confirmation, lag estimation, and backtesting.
That chain is the contribution.
The reported 15.38% return is attention-grabbing, as it should be. But the more durable idea is methodological: treat volatility as a way to define where prediction might be feasible, then use causal-inference tools to decide which directional relationships deserve to become trades.
For business operators, this is a useful middle ground between simplistic technical indicators and opaque black-box forecasting. It is interpretable enough to audit, modular enough to improve, and structured enough to become a repeatable research workflow.
The caveat is equally clear. A short backtest on a few large-cap stocks is not a licence to print money. It is a prototype with good instincts. Before production, it needs longer regime testing, broader asset coverage, stronger cost modelling, and careful benchmark discipline.
Still, the paper points in the right direction. The future of trading systems is not just better prediction. It is better filtering: knowing which signals deserve attention, which relationships deserve capital, and which beautiful backtests deserve to be escorted quietly out of the building.
Cognaptus: Automate the Present, Incubate the Future.
-
Ivan Letteri, “A Framework for Predictive Directional Trading Based on Volatility and Causal Inference,” arXiv:2507.09347, 2025. ↩︎