Tree of Alpha: How MST Networks and Neural Forecasts Outperformed the S&P 500

TL;DR for operators

A recent paper, Dependency Network-Based Portfolio Design with Forecasting and VaR Constraints, proposes a portfolio engine that first turns the S&P 500 into a dependency network, then strips that network down to a minimum spanning tree, then selects the five most central stocks, then allocates capital using risk-aware weights, and finally uses ARIMA or neural autoregressive forecasts to decide whether those positions deserve exposure on a given day.¹

The result most readers will notice is the one-year backtest: the paper’s best reported strategy, MST + AllAgree + VaR, returned 85.65% over June 2022 to October 2023, compared with 18.12% for the S&P 500 buy-and-hold benchmark. That is the shiny number. It is also the easiest number to misunderstand.

The more useful result is architectural. The paper shows a modular way to compress a 490-stock universe into a small, interpretable, structurally selected portfolio. It uses pairwise VAR models and forecast error variance decomposition to estimate directional influence among stocks; converts influence into network costs; extracts a sparse market backbone using a minimum spanning tree; ranks stocks by degree centrality; weights them by inverse VaR or Sharpe ratio; and optionally filters exposure with ARIMA or NNAR forecasts.

For asset managers, the immediate business relevance is not “copy this and print money”, because apparently markets still object to being solved by one table. The relevance is that network structure can become a practical layer between raw factor screens and opaque machine-learning portfolios. It can help identify central equities, monitor systemic anchors, reduce universe noise, and build more explainable allocation workflows.

The boundary is equally important. The simulation is short, concentrated, and gross of realistic frictions. The active strategies would likely face turnover, transaction costs, slippage, implementation latency, and model randomness. The method is promising as a research prototype and diagnostic framework; it is not yet evidence of a deployable production alpha engine.

The real trick is compression, not prediction

Most portfolio optimisation stories begin with the same ritual: more data, more signals, more features, more knobs. This paper begins somewhere more interesting. It asks whether the market can be simplified before it is traded.

That matters because the S&P 500 is not just a list of 500 tickers. It is a dense system of co-movement, sector clustering, shock transmission, shared macro exposure, liquidity effects, and occasional collective hysteria. Traditional correlation matrices catch some of that structure, but they are blunt instruments. They show association, not directional contribution to forecast uncertainty. They also tend to create very busy pictures, which is excellent if the objective is to decorate a risk committee slide and less excellent if the objective is to make a portfolio decision.

The paper’s core move is to build an influence network. It starts with 490 S&P 500 stocks, after filtering the index constituents for data quality. It uses daily adjusted closing prices from January 2020 to December 2024, computes daily returns, and applies a 120-trading-day rolling window. Each window becomes a temporary view of the market’s dependency structure.

Inside each window, the authors fit bivariate VAR(1) models for every stock pair. That is already computationally heavy: the paper reports more than 240,000 VAR estimations per period. The purpose is not to forecast every stock directly from every other stock in one giant multivariate model. Instead, the authors take a scalable pairwise route: estimate short-term dynamic relationships between pairs, then use those estimates to measure influence.

The influence measure comes from forecast error variance decomposition, or FEVD. In simple terms, FEVD asks: when stock A’s future return forecast is uncertain, how much of that uncertainty is attributable to shocks from stock B? This creates a directional relationship. B can explain uncertainty in A more than A explains uncertainty in B. Markets are rude like that.

The paper fixes the FEVD horizon at 10 days, aligning the network with a short-term investment perspective. The resulting influence values are then inverted into costs: higher influence means lower network cost. A dense directed network emerges, where stocks are nodes and FEVD-derived relationships are weighted edges.

This is the first important transformation:

Stage	What the paper does	Why it matters operationally
Pairwise VAR(1)	Estimates short-term lagged relationships between every stock pair	Creates a scalable approximation of dynamic dependency
FEVD	Measures how much one stock’s shocks contribute to another stock’s forecast uncertainty	Moves beyond static correlation toward directional influence
Influence-to-cost conversion	Turns stronger influence into lower network cost	Makes the relationship usable for graph filtering
Symmetrisation	Keeps the stronger of the two directional links for each pair	Converts the network into a form suitable for MST construction
Minimum spanning tree	Extracts a sparse connected backbone	Reduces market complexity while preserving global connectivity

That final step, the minimum spanning tree, is doing more than visual tidying. It is the paper’s cognitive compression engine.

A complete 490-stock network is almost useless to inspect directly. There are too many edges, too much noise, and too many ways to hallucinate meaning from a hairball. The MST keeps every stock connected but removes redundant links, leaving exactly enough edges to preserve a single connected structure with no cycles. In graph terms, it is sparse. In portfolio terms, it is a disciplined way to ask: which connections define the market’s backbone?

This is why the paper should not be read as a forecasting paper with a network garnish. It is better read as a network-filtering paper with forecasting added later.

Central stocks become candidates, not automatic winners

After constructing the MST for each rolling window, the authors rank stocks by degree centrality. Degree is the simplest centrality measure: a stock has high degree if it is connected to many other stocks in the tree.

That simplicity is not a flaw. In a sparse MST, degree has a direct interpretation. A high-degree node is not merely one of many correlated assets in a dense matrix; it is a stock that remains connected to multiple important branches after the network has thrown away redundant edges. The paper interprets these stocks as structurally important market nodes.

The strategy then selects the top five stocks by degree centrality as portfolio candidates.

That top-five choice is aggressive. It makes the method interpretable, but also concentrated. A five-stock portfolio is not a diversified S&P 500 substitute in the conventional sense. It is a compressed structural bet: if the MST correctly identifies stable market anchors, concentration becomes a source of efficiency. If the MST is unstable or capturing a transient artefact, concentration becomes a very elegant way to step on a rake.

The paper then applies two weighting schemes.

First, inverse historical VaR weighting. For each selected stock, it computes a 1-day historical VaR at the 95% confidence level using the same 120-day window. Operationally, this is the empirical 5th percentile of recent daily returns. Stocks with lower estimated downside risk receive higher weights.

Second, Sharpe-ratio weighting. The paper computes each selected stock’s Sharpe ratio over the rolling window, assuming a zero risk-free rate, and allocates more weight to stocks with stronger return per unit of volatility. Invalid, unstable, or negative weights are clipped or penalised to preserve a long-only structure.

The distinction matters:

Weighting rule	Allocation logic	Behavioural bias	Business interpretation
Inverse VaR	Give more capital to structurally central stocks with lower downside risk	Defensive	Useful when the goal is resilience and drawdown control
Sharpe ratio	Give more capital to structurally central stocks with better historical reward per unit risk	Performance-seeking	Useful when the goal is return efficiency inside the selected subset

The paper’s strongest results come from the VaR side, not the Sharpe side. That is telling. Once the portfolio is already concentrated in structurally central names, downside-aware weighting may be more valuable than chasing recent reward-to-risk scores. The MST selects importance; VaR keeps the importance from becoming recklessness. A small mercy, but in portfolio construction small mercies compound.

Forecasts decide exposure after the network chooses the stocks

The next layer is forecasting. The paper tests ARIMA and NNAR models as filters for the MST-selected candidates. These models predict next-day returns for each selected stock. If a stock’s forecast is negative, its raw weight is set to zero and the remaining weights are renormalised.

This creates a useful division of labour:

The network layer decides which stocks are structurally important.
The risk layer decides how much capital each selected stock deserves.
The forecast layer decides whether exposure should be allowed at all.

That separation is cleaner than it first appears. Many trading systems blend selection, weighting, and timing into a single opaque score. Here, each module has a recognisable job. The network finds centrality. VaR or Sharpe handles allocation. ARIMA or NNAR handles near-term direction. The AllAgree signal aggregates forecast signals into a trading decision: buy, liquidate to cash, or hold.

The ARIMA layer is the classical baseline. It models autocorrelation, differencing, and moving-average structure. The NNAR layer is the nonlinear alternative, using a feedforward neural network over lagged observations to forecast the next value. The paper repeats NNAR-based simulations over different random seeds to reduce dependence on one lucky neural initialisation.

That seed testing is important because neural forecasts can be irritatingly sensitive. The paper’s Table 2 reports results across 11 seeds for the NNAR and AllAgree variants, although an earlier simulation-mechanics sentence says the authors repeat across 10 random seeds. The results section itself presents 11 rows. This is not fatal, but it is the sort of small inconsistency an implementation team would want clarified before treating the numbers as audit-grade evidence.

The performance table is main evidence, not final proof

The headline experiment evaluates 11 strategies over a 365-trading-day period from June 2022 to October 2023. The benchmark is buy-and-hold exposure to the S&P 500 index.

The reported total returns are:

Strategy	Total return
Buy & Hold Benchmark	18.12%
MST + VaR	37.03%
MST + Sharpe	34.21%
MST + ARIMA + VaR	40.71%
MST + ARIMA + Sharpe	32.31%
MST + NNAR + VaR	74.81%
MST + NNAR + Sharpe	64.58%
MST + AllAgree + VaR	85.65%
MST + AllAgree + Sharpe	65.32%
Fixed Portfolio	42.10%
Dynamic VaR Portfolio	41.47%

The basic MST strategies already outperform the benchmark: 37.03% for MST + VaR and 34.21% for MST + Sharpe, versus 18.12% for buy-and-hold. That suggests the structural selection layer is not merely decorative. Even before neural forecasting enters the room wearing sunglasses, the MST-selected portfolios do better in the tested period.

ARIMA adds only modest value. MST + ARIMA + VaR rises to 40.71%, but MST + ARIMA + Sharpe falls to 32.31%. This is useful because it prevents a lazy conclusion that “forecasting improves everything”. It does not. Forecast filters interact with weighting rules, and not always politely.

NNAR is where the results jump. MST + NNAR + VaR reaches 74.81%, while MST + NNAR + Sharpe reaches 64.58%. The AllAgree ensemble pushes the VaR version to the paper’s top result: 85.65%. The Sharpe AllAgree version reaches 65.32%.

The interpretation is not simply “neural networks are better”. A more careful reading is that nonlinear forecast filters materially improved timing for this selected universe during this test window, especially when paired with downside-aware VaR weighting. Whether that advantage persists across regimes is not established by one year of simulation.

The fixed portfolio result is also quietly important. It returns 42.10%, above the benchmark and above several active variants. Because the fixed portfolio requires only an initial allocation, the paper argues that it would incur lower transaction costs than daily rebalanced strategies. That result supports a more conservative business thesis: even if active forecast-timed trading is too costly or unstable, MST-based structural selection may still identify a useful compressed equity subset.

This is where the paper becomes more interesting for operators than for headline readers. The active neural strategies may produce the largest gross returns, but the fixed portfolio may be closer to something an investment team can actually pilot without turning the execution desk into a smoke machine.

The figures and tables play different evidentiary roles

The paper includes several pieces of evidence, but they should not be treated as equally strong.

Paper element	Likely purpose	What it supports	What it does not prove
MST visualisation	Implementation detail and interpretability aid	The FEVD network can be sparsified into a connected market backbone	That the visual clusters are stable or economically causal
Table 1 strategy returns	Main empirical evidence	MST-based and forecast-filtered strategies outperform the benchmark in the tested period	Production profitability after costs, slippage, taxes, and capacity limits
Table 2 seed results	Robustness/sensitivity check for NNAR randomness	Neural and AllAgree variants are not dependent on one single seed	Stability across market regimes or retraining protocols
Cumulative value figure	Main evidence visualisation	Performance trajectories differ materially across strategies	Drawdown-adjusted superiority without additional risk metrics
Dynamic VaR weights figure	Diagnostic/implementation detail	Selected central stocks may have relatively stable weights	That centrality itself is persistent across all regimes

The paper’s own discussion frames central stocks as information hubs or systemic anchors. That is plausible within the method, but it should be read as a modelling interpretation, not a causal discovery result. FEVD from pairwise VAR models estimates directional contribution to forecast uncertainty under specific assumptions. It does not prove that one firm economically causes another firm’s returns. Finance has enough false prophets already; we do not need graph theory to manufacture more.

The business value is an interpretable allocation layer

For an asset manager, the most practical contribution is not the 85.65% backtest number. It is the architecture of the allocation pipeline.

A conventional quantitative workflow might start with factors, scores, or forecasts. The paper inserts a structural layer before weighting and timing. That layer can answer questions such as:

Which stocks sit at the backbone of the market’s dependency network?
Which names remain central after noisy pairwise relationships are filtered out?
How does the market’s influence structure change over rolling windows?
Can a small subset of stocks behave like a concentrated proxy for broader index exposure?
Can risk weighting and forecast filters improve that proxy without making it unintelligible?

This is useful because many investment teams face the same operational problem: they have too many assets to monitor deeply and too many signals to trust casually. A sparse network backbone gives them a disciplined reduction mechanism. It does not replace judgement, but it improves where judgement is aimed.

The method also has a governance advantage. A portfolio manager can explain the pipeline in modular terms:

We estimate short-term dependency among stocks.
We extract the market backbone.
We select the most connected stocks.
We weight those stocks by downside risk or reward-to-risk.
We use forecasts only as exposure filters.
We compare active and low-turnover variants.

That explanation is far more palatable than “the model liked Nvidia on Tuesday because of layer 17”. Interpretability is not a moral virtue in finance; it is an audit survival mechanism.

Where this could fit in an investment organisation

The paper’s framework could be used in several ways, with different levels of ambition.

Use case	Practical deployment	Confidence required
Research screen	Identify structurally central stocks for analyst review	Moderate
Risk dashboard	Monitor changes in dependency backbones and central nodes	Moderate
Portfolio candidate generator	Feed MST-selected stocks into an existing portfolio process	Moderate to high
Low-turnover index proxy	Build a fixed or periodically refreshed concentrated subset	High
Active trading strategy	Run daily forecast-filtered allocation with execution	Very high

The safest near-term use is diagnostic. A network dashboard can show which stocks are becoming more central, which sectors are clustering, and whether the market backbone is concentrating around a small number of names. This could support risk meetings, sector allocation reviews, and stress-testing.

The next step is candidate generation. The MST does not have to be the final portfolio. It can produce a watchlist of structurally important stocks that then pass through existing valuation, liquidity, risk, and compliance screens.

The most ambitious use is live trading. That requires much more evidence than the paper provides. Daily rebalancing, model retraining, forecast filtering, and full liquidation signals can all create execution drag. Gross returns are informative; net returns are where optimism goes to be professionally embarrassed.

The method is modular enough to improve

One reason the paper is worth reading is that its design is modular. Each component can be challenged, replaced, or extended.

The pairwise VAR(1) layer is scalable, but it ignores higher-order multivariate interactions. A sparse or regularised high-dimensional VAR could capture broader system effects, though at a higher modelling cost. The FEVD horizon is fixed at 10 days; other horizons could reveal different dependency structures. Degree centrality is intuitive, but alternatives such as betweenness, eigenvector centrality, or sector-adjusted centrality might select different stocks.

The top-five selection rule is also adjustable. Selecting more stocks could improve diversification while reducing interpretability. Selecting fewer could sharpen the structural bet while increasing concentration risk. The right number is not a mathematical constant handed down from Mount Efficient Frontier. It is a design parameter.

Forecasting can also evolve. The paper tests ARIMA and NNAR, but the network layer could be paired with other predictive models. The authors mention possible future extensions such as graph neural networks, dynamic Bayesian networks, mutual information-based metrics, and multi-layer networks incorporating macroeconomic signals, policy announcements, or sentiment.

For a business team, the modularity is valuable because it allows staged adoption. A firm does not need to deploy the full AllAgree trading engine on day one. It can start by reproducing the MST selection, then compare centrality stability, then test risk-weighted portfolios, then add forecast filters, then decide whether turnover is worth paying for.

The limitations are not footnotes; they define the use case

The paper reports strong results, but several boundaries shape how those results should be used.

First, the simulation period is limited. June 2022 to October 2023 is a meaningful trading year, but not a regime-complete validation. A strategy that performs well over one year may be exploiting a market structure specific to that period. It needs testing across different volatility regimes, rate environments, sector rotations, and crisis periods.

Second, the reported returns are gross. The paper explicitly notes that active strategies, especially NNAR and AllAgree variants, would incur turnover and therefore transaction costs and slippage. This matters because the strongest strategies are also likely the most operationally expensive. A backtest that liquidates to cash and reallocates frequently must be judged after execution assumptions, not before.

Third, the portfolio is concentrated. Selecting five stocks creates interpretability and potential efficiency, but it also creates idiosyncratic exposure. The method may behave like a compressed index in the tested window, but that does not guarantee index-like risk under stress.

Fourth, the network is built from pairwise VAR models. This makes the method scalable, but it sacrifices higher-order dependency modelling. Pairwise relationships can miss effects that only emerge in a full multivariate system. The paper acknowledges this and points to sparse or regularised high-dimensional VARs as future work.

Fifth, the NNAR layer introduces randomness. The paper addresses this with seed testing, and the results remain strong across reported seeds. Still, production use would need a stable retraining protocol, model versioning, monitoring for forecast decay, and rules for when the neural filter should be trusted less.

Sixth, the framework needs richer risk reporting. Total return is not enough. A production evaluation would need drawdown, volatility, turnover, hit rate, exposure days, sector concentration, capacity, liquidity, tax impact, and benchmark-relative risk. The cumulative value chart is useful, but a CIO will quite reasonably ask for the rest of the risk pack before allowing the model anywhere near client capital.

The cleanest interpretation: structural selection did work here

The paper’s strongest defensible conclusion is narrower than the headline result and more useful than a generic caution.

In the tested sample, MST-based structural selection improved performance versus the S&P 500 benchmark, even before neural forecasts were added. VaR weighting generally worked better than Sharpe weighting. NNAR and AllAgree filters materially improved gross returns, especially when combined with VaR. The fixed portfolio result suggests that much of the value may come from selecting structurally central stocks, not necessarily from frequent trading.

That is a useful result.

It implies that market dependency structure may contain actionable information for portfolio construction. It also implies that a sparse graph representation can serve as a bridge between traditional statistical modelling and more adaptive trading systems. The paper does not settle whether this is a robust alpha source. It does show a promising way to organise the search.

For operators, that distinction is everything. A backtest is not a business. A mechanism that reduces complexity, supports diagnosis, and can be tested module by module is closer to one.

Conclusion: the tree is more interesting than the fruit

The tempting story is that the paper found a strategy returning 85.65% while the S&P 500 returned 18.12%. That story is true as far as the reported simulation goes, but it is not the best story.

The better story is that the authors built a pipeline for turning market dependency into portfolio structure. Pairwise VAR and FEVD estimate influence. Influence becomes network cost. The MST extracts a sparse backbone. Degree centrality selects structurally important stocks. VaR or Sharpe weighting allocates capital. Forecast models decide whether exposure should remain active.

That mechanism is the real contribution. It gives portfolio teams a way to compress the market without immediately surrendering interpretability. It also gives researchers a modular framework where every component can be stress-tested, replaced, or improved.

The paper’s results are strong enough to deserve attention and bounded enough to require discipline. Which is, inconveniently, how most useful finance research arrives: not as a money printer, but as a better map of where the money printer definitely is not.

Cognaptus: Automate the Present, Incubate the Future.

Zihan Lin, Haojie Liu, and Randall R. Rojas, “Dependency Network-Based Portfolio Design with Forecasting and VaR Constraints,” arXiv:2507.20039, 2025. ↩︎

TL;DR for operators#

The real trick is compression, not prediction#

Central stocks become candidates, not automatic winners#

Forecasts decide exposure after the network chooses the stocks#

The performance table is main evidence, not final proof#

The figures and tables play different evidentiary roles#

The business value is an interpretable allocation layer#

Where this could fit in an investment organisation#

The method is modular enough to improve#

The limitations are not footnotes; they define the use case#

The cleanest interpretation: structural selection did work here#

Conclusion: the tree is more interesting than the fruit#