Classic finance assumes that the past doesn’t matter — only the present state of the market matters for decisions. But in a new paper from researchers at Imperial College and Oxford, a kernel-based framework for trading strategy design exposes how this assumption leads to suboptimal choices. Their insight: memory matters, and modern tools can finally make use of it.

From Markowitz to Memory

The Markowitz mean-variance framework still underpins much of quantitative portfolio theory. It assumes that the relationship between a trader’s position and the market can be optimized based on instantaneous expectations of return and risk.

But what if a trading signal depends not just on current prices, but on how they got there? What if volatility itself is path-dependent — say, because of intraday effects or information decay? The authors argue that trading strategies should be path-dependent too.

Instead of representing positions as a function of current state variables, they propose parameterizing them as functions in a Reproducing Kernel Hilbert Space (RKHS) — a mathematical structure that allows one to define functions based on entire historical trajectories.

Kernel Tricks Meet Portfolio Optimization

At the heart of this framework is the ability to compute inner products in an infinite-dimensional space without ever explicitly transforming the data — a well-known trick in machine learning.

The trader’s position at time $t$ is:

$$ \xi_t = \phi(\psi(X_{0,t})) = \langle \phi, K(\psi(X_{0,t}), \cdot) \rangle_{\mathcal{H}_K} $$

Where:

  • $X_{0,t}$ is the asset path up to time $t$
  • $\psi$ is a feature embedding (e.g. concatenated price + signal history)
  • $K$ is a kernel (e.g. signature kernel)

The optimization objective is the familiar:

$$ \text{maximize}\quad \mathbb{E}[V_T] - \frac{\eta}{2}\text{Var}[V_T] $$

…but with $V_T$ defined as the integral over a path-dependent function, making the optimization non-Markovian.

Closed-Form Solutions — No Gradient Descent Needed

A standout result is that closed-form solutions for the optimal strategy still exist. Using spectral decomposition of the kernel Gram matrix (or pseudo-inverse if you prefer), the optimal weights $\alpha^*$ over sample paths can be computed efficiently. This avoids the need for stochastic gradient descent or deep learning — yet captures nonlinear, temporal effects.

Feature Kernel Trader Signature Trader [FHW23]
Online-friendly Slower (requires kernel evals vs past paths) Faster (truncated signature)
Multi-asset scalable ✅ Easier via operator kernels ❌ Slower with higher dimensions
Captures full history ✅ Full infinite-order representation ❌ Requires truncation
Interpretability ❌ Lower ✅ Higher (linear functional)

Outperforming Markovian Models

On both synthetic and real intraday MSFT data, the kernel-based strategies beat traditional Markovian (i.e., current-signal-only) strategies across several dimensions:

  • Slow-decaying signals (e.g., power-law memory): better learned by the kernel trader
  • Stochastic volatility: kernel trader adapts without re-specifying the model
  • Small-sample settings: kernel method generalizes better out-of-sample when hyperparameters are regularized

Perhaps most intriguingly, the kernel method performs nearly on par with the theoretically optimal strategy that has access to the true drift $\mu_t$ — even when trained only on noisy signals $I_t$.

Application to Crypto and Real-Time Strategy Design

For high-frequency or crypto traders, where non-stationary signals, social sentiment decay, or cross-asset order flow dominate, this framework is especially compelling.

Rather than forcing a signal into a memoryless form (e.g., a 5-minute moving average), one can embed the entire historical context directly into the trading model. And unlike deep learning approaches, the method retains:

  • Interpretability (via kernel and feature map choices)
  • Closed-form updates (faster for real-time inference)
  • Stability (via spectral truncation and regularization)

A Trade-Off Worth Making

While the kernel-based framework introduces more hyperparameters (e.g., path scaling $\gamma$, regularization $\lambda$, eigenvector cutoff $m$), it allows for precise control over complexity and robustness.

Yes, it’s more complex than a linear Markowitz strategy — but in a world where signals decay, react, and interact over time, linear just isn’t good enough.


Cognaptus: Automate the Present, Incubate the Future.