Branching Out, Beating Down: Why Trees Still Outgrow Deep Roots in Quant AI

In the age of Transformers and neural nets that write poetry, it’s tempting to assume deep learning dominates every corner of AI. But in quantitative investing, the roots tell a different story. A recent paper—QuantBench: Benchmarking AI Methods for Quantitative Investment¹—delivers a grounded reminder: tree-based models still outperform deep learning (DL) methods across key financial prediction tasks.

XGBoost: Still the Evergreen in the Quant Forest

Let’s start with the basics. Tree-based models like XGBoost (Extreme Gradient Boosting) work by building many decision trees and combining their outputs. Each tree is a set of yes/no questions—“Is the stock’s 7-day return > 2%?”—and each new tree learns to correct the mistakes of the previous ones.

Mathematically, this is an ensemble method that minimizes loss via gradient descent over additive functions.

$$ \min_{F_m} \sum_{i=1}^n L(y_i, F_{m-1}(x_i) + f_m(x_i)) $$

Where $F_m$ is the ensemble after $m$ steps, $f_m$ is the new tree added at step $m$, and $L$ is the loss function.

In practical terms? Think of it as a wise committee where each new member only speaks up when earlier ones got it wrong. It’s efficient, robust to noisy features, and handles tabular data like financial time series superbly.

Figure 1: Overview of QuantBench architecture and pipeline

What About Deep Learning?

Deep Learning sounds flashier—and in many fields, it is. It includes:

RNNs (Recurrent Neural Networks): These models maintain a hidden state that updates with each timestep, allowing them to remember past inputs. This makes them suited for sequential data, though they often struggle with long-term dependencies and noisy financial series. Mathematically, RNNs compute $h_t = \sigma(Wh_{t-1} + Ux_t + b)$, where $h_t$ is the hidden state and $x_t$ is input at time $t$.
GNNs (Graph Neural Networks): GNNs operate on graph-structured data by aggregating and transforming information from a node’s neighbors, enabling models to learn from relational structures like stock co-movement graphs. At each layer, node $v$’s representation is updated via $h_v’ = \text{ReLU}(W \cdot \text{AGG}({h_u | u \in N(v)}))$.
Transformers: They rely on self-attention mechanisms that allow every input element to weigh every other input element, capturing long-range dependencies without recurrence. The self-attention is defined as $\text{Attention}(Q,K,V) = \text{softmax}(\frac{QK^T}{\sqrt{d_k}})V$, which excels in capturing global context.

So why do these models often underperform in finance? Because markets are noisy, low-signal environments where generalization matters more than abstraction. Tree models like XGBoost remain resilient where DL can easily overfit.

Figure 2: Data processing pipeline supported by QuantBench

What QuantBench Benchmarked

QuantBench compares over 10 models across four stock markets (US, CN, HK, UK) and multiple tasks:

Return prediction: forecasting next-day or next-week stock movement.
Risk-adjusted return: combining return with volatility metrics like Sharpe ratio.

It evaluates raw integration (just concatenating features) versus graph-based integration (e.g., building a knowledge graph from news or firm data). The latter is more structured but also harder to get right.

A knowledge graph is a structured network where entities (e.g., firms, events) are nodes, and relationships (e.g., sector affiliation, supply chain, or joint ventures) are edges. Mathematically, it’s represented as $G = (V, E)$, with a feature matrix $X \in \mathbb{R}^{|V| \times d}$ and an adjacency matrix $A \in {0,1}^{|V| \times |V|}$. Information can be propagated via layers such as $H^{(l+1)} = \sigma(AH^{(l)}W^{(l)})$. For example, if company A is a supplier of company B, their link might allow the model to infer risk contagion or supply shocks from A to B.

In many cases, raw methods + tree models still win.

Figure 3: Model landscape and evolution in QuantBench

Summary of Temporal and Spatial Models in QuantBench

Category	Model Name	Reference
Tree-based	XGBoost, LightGBM, CatBoost	Chen & Guestrin (2016), Ke et al. (2017), Prokhorenkova et al. (2018)
RNN-based	LSTM, SFM, DA-RNN, Hawkes-GRU	Hochreiter & Schmidhuber (1997), Zhang et al. (2017), Qin et al. (2017), Sawhney et al. (2021a)
CNN/MLP	TCN, MLP-Mixer	Bai et al. (2018), Tolstikhin et al. (2021)
Transformer	Informer, Autoformer, FEDFormer, PatchTST	Zhou et al. (2021), Wu et al. (2022), Zhou et al. (2022), Nie et al. (2022)
GNN	GCN, GAT	Kipf & Welling (2017), Velickovic et al. (2018)
Hetero-GNN	RGCN, RSR	Schlichtkrull et al. (2018), Feng et al. (2019)
Hypergraph	ESTIMATE, STHCN, STHAN	Huynh et al. (2022), Sawhney et al. (2020, 2021b)

Backtesting Isn’t Enough

QuantBench critiques the overreliance on simplistic backtesting setups. You can have a model that looks great in historical returns but fails catastrophically in live trading.

Figure 4: Comparison of different rolling schemes used in evaluation

A toy backtest setup might look like:

for t in range(train_end, test_end):
    prediction = model.predict(X[t-lookback:t])
    pnl[t] = prediction * returns[t]  # assumes full position with no market impact

This is too simplistic. A more realistic approach would include rolling retraining, transaction costs, and delayed signal execution:

for window in rolling_windows:
    model.fit(X[window.train])
    preds = model.predict(X[window.test])
    for i, t in enumerate(window.test):
        exec_price = simulate_execution(preds[i], market_data[t], delay=1)
        cost = estimate_transaction_cost(exec_price, market_conditions[t])
        pnl[t] = (exec_price - market_data[t]['open']) * position_size - cost

Still, this omits market impact, latency, microstructure effects, and assumes cost functions that may not scale. QuantBench urges backtests to reflect real-world conditions: signal delay, sector constraints, transaction costs, and rolling portfolio effects.

This aligns with what we emphasized in Agents in Formation: Finetune Meets Finestructure in Quant AI—finance is not just another benchmark; it’s a battleground where generalization is alpha.

That is, your model isn’t just solving a task—it’s competing in a dynamic, adversarial ecosystem. Like a chess engine playing against other engines, not a static puzzle. In this context, robustness beats elegance. Reusability beats sophistication.

Why This Still Matters in an Agentic World

You might wonder: if LLM-based agents are the future, why care about whether trees still beat DL?

Because even agentic AI—as covered in From GenAI to Agentic AI² and Agentic Agents: A Comprehensive Survey³—relies on sound model choices beneath the agent’s planning and memory layers.

An AI agent that recommends trades or rebalances a portfolio still needs accurate signals at the base. And if that signal comes from an XGBoost forest instead of a 12-layer Transformer, so be it.

Tree-based and DL models are domain-specific intelligence components within a broader agentic framework. Just as a robotic arm needs a reliable gripper, an agentic system needs dependable submodels. We shouldn’t override domain-specific reliability with fashionable architectures unless the upgrade is empirically better.

The Road Ahead: Hybrid Minds, Smarter Bets

None of this is to say deep learning is useless. It shines when fusing image, text, and graph data. But tree-based methods remain the quantitative backbone—and smart agentic systems will know when to delegate.

Figure 5: Ensemble learning curve with variance bands under different rolling settings

As argued in Overqualified, Underprepared, reasoning alone won’t save your portfolio. Your model—whether a language agent or a decision tree—needs to know what matters.

References

Chen, T. and Guestrin, C., 2016. XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD, pp.785–794.

Ke, G. et al., 2017. LightGBM: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems, 30.

Prokhorenkova, L. et al., 2018. CatBoost: unbiased boosting with categorical features. Advances in Neural Information Processing Systems, 31.

Hochreiter, S. and Schmidhuber, J., 1997. Long short-term memory. Neural Computation, 9(8), pp.1735–1780.

Zhou, H. et al., 2021. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. AAAI, 35(12).

Wu, H. et al., 2022. Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting. arXiv preprint.

Nie, Y. et al., 2022. PatchTST: A Time Series is Worth 64 Words. arXiv preprint.

Kipf, T.N. and Welling, M., 2017. Semi-supervised classification with graph convolutional networks. arXiv preprint.

Velickovic, P. et al., 2018. Graph Attention Networks. arXiv preprint.

Schlichtkrull, M. et al., 2018. Modeling relational data with graph convolutional networks. The Semantic Web.

Feng, F. et al., 2019. Temporal relational ranking for stock prediction. ACM Transactions on Information Systems, 37(2).

Huynh, T.T. et al., 2022. Efficient integration of multi-order dynamics and internal dynamics in stock movement prediction. arXiv preprint.

Sawhney, R. et al., 2020. Spatiotemporal hypergraph convolution network for stock movement forecasting. ICDM.

Sawhney, R. et al., 2021. Stock selection via spatiotemporal hypergraph attention network. AAAI, 35(1).

Cognaptus: Automate the Present, Incubate the Future

QuantBench: Benchmarking AI Methods for Quantitative Investment. https://arxiv.org/abs/2504.18600 ↩︎
From GenAI to Agentic AI: Capabilities, Components, and Challenges. https://arxiv.org/abs/2504.18875 ↩︎
Agentic Agents: A Comprehensive Survey of LLM-based Agents. https://arxiv.org/abs/2504.19678 ↩︎

XGBoost: Still the Evergreen in the Quant Forest#

What About Deep Learning?#

What QuantBench Benchmarked#

Summary of Temporal and Spatial Models in QuantBench#

Backtesting Isn’t Enough#

Why This Still Matters in an Agentic World#

The Road Ahead: Hybrid Minds, Smarter Bets#

References#