EMAzing Trends: When One Moving Average Beats a Basket of Signals

TL;DR for operators

Most trend-following signal libraries behave like kitchen drawers: MACD, crossovers, momentum mixes, Bollinger Bands, short lookbacks, long lookbacks, “robust” blends, and a few legacy knobs nobody wants to delete because they once looked clever in 2017.

Sebastien Valeyre’s paper argues that much of this complexity may be unnecessary for medium-frequency cross-asset futures trend following.¹ The paper tests whether the theoretical Sharpe-ratio curve derived by Grebenkov and Serror for EMA trend following is visible in real data. It is. Using daily returns for 70 futures instruments across commodities, FX, stock indices, and bonds from May 1990 to December 2023, the empirical Sharpe curve for an Agnostic Risk Parity portfolio fitted with normalized EMA signals lines up closely with the theoretical curve.

The operational punchline is simple, but not simplistic: before building a basket of technical indicators, a quant team should first beat a normalized EMA signal inside a disciplined portfolio-construction framework. In the paper’s test, the fitted theoretical optimum is an EMA parameter of about 112 business days, equivalent to a half-life of about 78 business days. In the appendix result table, the discrete EMA tests peak around ARP(100), with a gross Sharpe of 1.244945 and an average holding period of 81 days; nearby horizons such as ARP(80) and ARP(120) are very close, which matters because a fragile optimum is just overfitting with nicer typography.

The paper’s deeper claim is about mechanism. If trends can be approximated by a one-timescale mean-reverting process, and if positions scale linearly with signal strength rather than just flipping long/short, then the optimal signal should look like a normalized EMA. Under that lens, MACD-style three-timescale refinements and Bollinger-band mixtures are not obviously richer. They often become elaborate ways to reconstruct the same basic exposure. Very impressive. Also, very expensive, once you count research hours.

For business use, the lesson is not “delete all indicators.” It is “make complexity earn its rent.” A signal committee, CTA platform, allocator due-diligence team, or automated strategy factory should treat the EMA-plus-ARP result as a governance baseline: clear, auditable, theoretically motivated, and difficult to embarrass with another backtest spreadsheet.

The signal zoo exists because the mechanism is usually missing

Trend following has a familiar practical recipe. Choose a few price-based indicators. Blend several horizons. Volatility-normalize the result. Apply risk controls. Backtest. Keep the variants that survive. Give the slide deck a clean name. Pretend the surviving recipe was designed from first principles.

The paper attacks this workflow at its weakest point: the jump from “many indicators” to “more robust signal.” That jump feels intuitive. Different investors have different horizons. Markets may trend over weeks, months, and years. A basket of indicators seems diversified, and diversified sounds responsible. Nobody was ever fired for adding one more smoothing parameter, apparently.

But a basket is only useful if its members bring genuinely different information. If most of the indicators are transformations of the same price history, highly correlated after normalization, and selected on the same backtest, the basket may be less like diversification and more like a committee of identical twins wearing different ties.

Valeyre’s paper reframes the question. It does not ask, “Which indicator had the best historical Sharpe?” That would be the usual beauty contest, with all the statistical dignity of a casino mirror. Instead, it asks whether an existing theoretical model predicts the shape of the Sharpe curve as the EMA parameter changes. If theory predicts the curve and the empirical curve follows it, then the simple signal is not merely a backtest winner. It becomes a plausible measurement of the underlying trend mechanism.

That is why this article needs a mechanism-first reading. The paper is not mainly about an EMA beating MACD in a table. The table is downstream. The real argument is that a one-timescale stochastic model of trend can explain why a simple EMA should be close to optimal in the first place.

The model says trend is a small autocorrelated bias, not a collection of folklore

Grebenkov and Serror’s starting point is a stylized return process.² Daily returns contain a random noise component and a smaller autocorrelated component, the latter representing the trend. That trend component follows an AR(1)-like mean-reverting process with one relaxation time scale.

In plain English: the market does not carry a permanent directional bias, but it may carry a temporary one. The bias persists for a while, then decays. If the decay follows a single exponential pattern, an EMA becomes the natural detector. Not because traders like EMAs, not because charting software puts them in the toolbar, but because the signal’s memory shape matches the assumed structure of the process.

The paper’s normalized EMA signal is an exponential moving average of returns scaled by volatility. The half-life is approximately:

$$ \text{half-life} \approx \frac{\ln(2)}{\eta} $$

where $\eta$ is the EMA smoothing parameter. A smaller $\eta$ means a slower signal with a longer memory. A larger $\eta$ means a faster signal.

The crucial modelling choice is that positions depend linearly on the signal. This differs from binary trend rules where the model is either long or short based only on the sign of the signal. Linear sizing preserves information in the magnitude of the trend estimate. A weak positive signal and a strong positive signal should not be treated as the same thing unless one enjoys throwing away information for sport.

This is also where portfolio construction enters. Valeyre tests the signal inside Agnostic Risk Parity, a portfolio construction method that normalizes signals using the inverse square root of the correlation matrix rather than the full inverse used in classical Markowitz-style optimization.³ The point is not a cosmetic allocation choice. ARP helps aggregate cross-asset evidence while reducing the noise that appears when each market is studied separately.

That matters because a one-market trend-following test is noisy. A 30-year Sharpe estimate for one asset can be mostly measurement fog. A diversified portfolio can reveal the shape of the signal curve more clearly.

The empirical design tests a curve, not just a leaderboard

The paper’s empirical setup is straightforward but important. It uses daily returns from 70 futures instruments across commodities, currencies, stock indices, and bonds, over the period from 29 May 1990 to 7 December 2023. Signals are fed into an ARP portfolio, resized to target constant volatility. The paper then compares empirical gross Sharpe ratios across different EMA parameters against the theoretical Sharpe curve.

The main evidence is Figure 5: empirical Sharpe ratios versus EMA parameter, compared with the Grebenkov-Serror theoretical formula. The fit is strong. The fitted model uses $\lambda \approx 1/180.65$ and $\beta_0 = 0.12$, with an indicative confidence interval for $\lambda$ around $[1/209, 1/158]$ and a bootstrap interval around $[1/223, 1/157]$. The paper reports $R^2 = 0.98$, while correctly noting that this should be treated as indicative rather than as a clean formal goodness-of-fit test, because the data are autocorrelated and the fitting problem is nonlinear.

That distinction matters. The result is not “the EMA has the best point estimate in a backtest.” The result is “the whole empirical curve resembles the theoretically predicted curve.” That is much more interesting. A single point can be lucky. A curve is harder to fake, though not impossible. Finance, being finance, always keeps a spare trapdoor.

The paper also compares ARP against a naïve “1/N” portfolio. The ARP measurements fit the theoretical curve better; the naïve version produces a lower and noisier fit, with the paper reporting $\lambda = 1/110$ and $R^2 = 0.75$ for that case. The interpretation is that ARP does not merely increase performance. It improves measurement by reducing repeated exposure to highly similar markets, such as correlated stock indices.

Here is the useful map of the paper’s tests:

Paper element	Likely purpose	What it supports	What it does not prove
Figure 5: empirical Sharpe curve versus EMA parameter	Main evidence	The theoretical Grebenkov-Serror Sharpe curve fits the ARP EMA backtests closely	That the same optimum holds after all costs, in all asset universes, or under future regimes
Table 4: gross Sharpe for EMA and MACD variants	Main evidence plus ablation	Nearby EMA horizons perform similarly; MACD variants do not clearly improve on the simple EMA	That MACD is useless in every implementation or market
Figure 6: Sharpe versus random universe size	Robustness and mechanism support	Diversification and ARP projection improve signal-to-noise; Sharpe scales with universe size in a way close to the proposed formula	That more instruments always increase net returns after capacity, liquidity, and execution constraints
Figure 7: EMA replicated by SMA and Bollinger-band mixtures	Exploratory extension and interpretability device	Complex indicator mixtures can mimic a simple EMA sensitivity profile	That all Bollinger-band strategies are mathematically identical to EMA strategies
Nonlinear test inspired by Schmidhuber	Challenge test / exploratory robustness	A cubic dampening of extreme trends does not improve Sharpe in this ARP setup	That nonlinear trend effects never matter
Figure 8: correlations among indicator strategies	Diagnostic robustness	Many indicator variants are highly correlated, especially near the relevant EMA/MACD region	That correlations will remain identical in every regime

This is the section most practitioners should read twice. The appendix table is not decorative paperwork. It is where the anti-complexity argument becomes operational.

The 112-day optimum is not a magic number, which is good

The fitted theoretical optimum corresponds to $\eta_{\text{opt}} = 1/112$, or a half-life of roughly 78 business days. The appendix table reports gross Sharpe ratios for tested EMA settings:

Strategy	Indicator	Gross Sharpe	Average holding period
ARP(20)	EMA	1.079535	38 days
ARP(50)	EMA	1.189320	60 days
ARP(80)	EMA	1.240349	74 days
ARP(100)	EMA	1.244945	81 days
ARP(120)	EMA	1.235455	88 days
ARP(150)	EMA	1.207496	96 days
ARP(180)	EMA	1.172569	—
ARP(400)	EMA	0.955223	132 days
ARP(1000)	EMA	0.633678	155 days

The peak among the discrete tested values is ARP(100), but the fitted theoretical optimum is around 112 business days. That is not a contradiction; it is what happens when a continuous curve is sampled at finite parameter settings. More importantly, the curve is not razor thin. ARP(80), ARP(100), and ARP(120) are all close. That reduces the practical danger of over-tuning.

This is a valuable result for operators. A strategy that only works at exactly 103.7 days, but collapses at 95 and 115, is not a strategy. It is an accusation waiting for a due-diligence meeting. Here the paper suggests a broad medium-frequency zone, with deterioration as the EMA becomes much too fast or much too slow.

The economic reading is also sensible. Very fast signals turn over more and are more exposed to noise. Very slow signals respond too late and may dilute the trend. The middle wins. The paper’s contribution is not discovering that middle horizons can work; trend followers already knew that. The contribution is showing that the shape of the middle can be explained by a theoretical curve rather than by a pile of parameter folklore.

MACD does not buy much extra memory in this setup

The paper then asks the obvious objection: what if the market has more than one relevant time scale?

That objection is legitimate. Investors operate at different horizons. Volatility has long memory. Macro cycles do not politely decay on one neat exponential clock. So the paper tests MACD-style indicators built from three EMA time scales. These are designed to reduce sensitivity to very recent returns and add more weight to older returns, giving the signal a fatter memory tail.

The result is underwhelming, in the useful sense. Table 4 reports the following MACD-style variants:

Strategy	Indicator	Gross Sharpe	Average holding period
ARP(0 × 20, 120, 0 × 400)	MACD	1.235455	88 days
ARP(20, 120, 0.2 × 400)	MACD	1.203418	97 days
ARP(20, 120, 0.4 × 400)	MACD	1.176466	—
ARP(20, 90, 0.3 × 400)	MACD	1.214172	89 days
ARP(20, 80, 0.3 × 400)	MACD	1.218031	85 days
ARP(20, 80, 0.2 × 400)	MACD	1.228186	82 days
ARP(20, 80, 0.4 × 400)	MACD	1.206864	87 days

None of these clearly beats the best simple EMA result. Some are close, which is exactly the point: close is not enough if the more complex model adds parameters, governance burden, testing degrees of freedom, and a larger surface area for cherry-picking.

The correlation evidence pushes the same way. The paper reports that strategies using nearby EMA and MACD-style variants are highly correlated. In Figure 8, the simple EMA around ARP(120) is extremely close to the introduced three-timescale indicators, with correlations around 0.99 to 1 for several variants. When two signals are that close, calling them “diversified” is generous. Calling them “separate alpha sources” is performance art.

Bollinger bands become a very elaborate EMA costume

The Bollinger-band section is the paper’s most mischievous result. Bollinger Bands are nonlinear and path-dependent: they trigger when price moves outside a band around a moving average. In practitioner language, they look more sophisticated than an EMA. They have thresholds. They have bands. They have that charming aura of technical-analysis seriousness.

Valeyre shows that a simple EMA can be replicated by a complex mixture of simple moving averages, and that those simple moving averages can in turn be decomposed into a large collection of Bollinger-band indicators. Figure 7 illustrates how weights in such a mixture can form a bell-shaped distribution centered around roughly 200 days when replicating the optimal EMA.

The business translation is brutal: a manager can show investors a refined-looking blend of short-term and long-term Bollinger-band components, while the aggregate exposure behaves like a simple EMA. The machinery may be real. The incremental information may not be.

This does not make Bollinger Bands fraudulent. It makes them diagnostically dangerous. If a complex nonlinear signal collapses into the sensitivity profile of a simple EMA after aggregation, then the research question should not be “Does the complex thing sound diversified?” It should be “What does it add that the simple baseline does not already contain?”

That is a much less comfortable question. Conveniently, it is also the correct one.

ARP matters because the portfolio is part of the measurement instrument

A subtle feature of the paper is that it does not treat signal design and portfolio construction as separable toys. The same signal can look different when tested market-by-market, in a naïve equal-weight portfolio, or inside a correlation-aware allocation.

The ARP framework matters because it aggregates evidence across markets while reducing repeated exposure to similar instruments. If the universe includes multiple related stock indices, naïve equal weighting can overweight the same economic exposure and distort the apparent speed of the trend. ARP attempts to normalize that structure.

The paper’s universe-size experiment in Figure 6 makes this concrete. It tests random sub-universes of 1, 3, 6, 9, 15, 20, and 27 assets, with 20 random trials for each case, using an EMA parameter of $1/120$. The proposed scaling relation is:

$$ \sqrt{\frac{N}{1 + (N - 1)\rho^2}} $$

where $N$ is the size of the universe and $\rho^2$ represents an average squared correlation term. The fitted $\rho^2$ is reported as $0.024 \pm 0.012$, lower than the empirical average squared weekly-return correlation of 0.056. The fitted curve implies a Sharpe ratio around 1.28 for $N = 70$, 1.40 for $N = 140$, and 1.60 for an infinite universe.

The exact infinite-universe number is not an investable promise. Please do not walk into an allocation meeting waving infinity. The practical point is that aggregation reduces noise and makes the trend mechanism easier to detect. At $N = 1$, the mean Sharpe is only around 0.2, similar to the estimated noise level. At portfolio scale, the curve becomes visible.

That is an important lesson for strategy research platforms: the backtest protocol can either reveal the mechanism or bury it. Testing each market separately and then storytelling the survivors is a reliable way to manufacture confusion.

What the paper directly shows

The paper directly shows four things.

First, for this 70-futures universe over 1990-2023, a normalized EMA signal inside ARP produces an empirical Sharpe curve that closely fits the Grebenkov-Serror theoretical Sharpe formula. The reported fit is strong, with $R^2 = 0.98$, although the paper treats this as indicative rather than a clean formal test.

Second, the fitted optimum is in the medium-frequency zone: $\eta_{\text{opt}} \approx 1/112$, or roughly a 78-business-day half-life. The appendix table’s discrete tests peak around ARP(100), and nearby EMA settings remain close.

Third, the MACD-style three-timescale variants tested in the paper do not clearly improve on the simple EMA. They change the shape of the sensitivity to past returns, but the empirical payoff is not compelling.

Fourth, complex indicator mixtures can replicate simpler sensitivity profiles. Bollinger-band mixtures may look more sophisticated, but the paper shows how they can approximate the same EMA exposure. The costume changes; the economic exposure may not.

What Cognaptus infers for business use

The practical inference is not that every trend-following desk should standardize on exactly 112 business days and go home early. Though, spiritually, some research committees might benefit.

The inference is that a simple, normalized EMA inside a robust portfolio-construction framework should become the baseline model. Any additional signal layer should justify itself against that baseline in three ways:

Operational question	Why it matters
Does the new signal improve net performance after realistic costs?	Gross Sharpe is not the same as tradable return. Research notebooks often forget this, brokers do not.
Does it reduce drawdown or regime sensitivity without duplicating EMA exposure?	A highly correlated signal may add comfort, not diversification.
Does it improve interpretability, governance, or risk control?	Complexity is acceptable when it lowers operational risk. It is not acceptable when it merely decorates the backtest.
Does it survive out-of-sample, cross-market, and walk-forward tests?	Parameter-rich systems need stronger evidence because they have more ways to flatter history.
Can the investment committee explain why the signal exists?	“It improved the 2003-2018 Sharpe in version 47 of the grid search” is not an explanation. It is a confession.

For CTA managers, this is a signal-library audit. For allocators, it is a due-diligence prompt. Ask whether the manager’s many trend indicators are genuinely orthogonal or mostly correlated variations of a medium-term EMA. For fintech teams building automated strategy engines, it is a design rule: include simple theoretical baselines before letting users generate indicator salads at scale.

For AI-driven investment platforms, the lesson is even sharper. Agentic research systems can generate thousands of variants. That is useful only if the system also performs redundancy checks, theoretical baseline comparisons, and parameter-mining controls. Otherwise it automates the present in the worst possible way: by mass-producing overfit indicators with excellent manners.

The boundaries are practical, not decorative

The paper is useful because it is disciplined, but its boundaries matter.

The backtests report gross Sharpe ratios. The paper notes portfolio smoothing and argues that trading costs should be small at the relevant holding periods, but net performance after realistic market impact, roll costs, financing, slippage, and capacity constraints is not the core test. For a CTA business, those details are not footnotes. They are where the P&L goes to be interrogated.

The data are historical futures returns from 1990 to 2023. That is broad and relevant, but not universal. The result does not prove that the same EMA optimum applies to single-name equities, crypto, illiquid markets, intraday execution, or constrained institutional mandates.

The theoretical mechanism assumes that a one-timescale mean-reverting process is a good approximation for the trend component. The empirical fit supports that assumption in this setting. It does not prove that markets have only one time scale. It suggests that, at the medium-frequency level relevant to this cross-asset trend-following portfolio, one time scale may be enough.

The nonlinear challenge remains open. The paper discusses Schmidhuber’s finding that extreme trends may require a transformation like $\phi - c\phi^3$, with $c = 0.33$, effectively dampening very large trend signals.⁴ Valeyre tests a related nonlinear modification and finds no Sharpe improvement in the ARP setup. That is informative, not final. Nonlinear effects may still matter for risk management, crisis behaviour, crowding, or specific asset classes.

Finally, the paper’s strongest claim depends on the whole pipeline: normalized EMA, ARP construction, diversified futures universe, linear position sizing, volatility targeting, and the chosen historical period. Remove enough pieces and the conclusion may change. Mechanisms are not stickers. You cannot peel one off and paste it onto a different trading system without checking the glue.

The governance lesson: complexity needs a burden of proof

The paper’s real enemy is not MACD. It is unmanaged optionality.

Every additional indicator adds choices: window lengths, weights, thresholds, transformations, volatility estimates, smoothing speeds, rebalance rules, and exception handling. Each choice creates another place where historical data can whisper flattering nonsense. A complex basket can be robust, but it can also be a cherry-picking machine with a committee-approved interface.

A normalized EMA baseline helps because it is hard to hide behind. It has one main time-scale parameter. Its mechanism is interpretable. Its Sharpe curve can be compared with theory. Its nearby parameters can be checked for stability. It gives the research team a clean null model: if the fancy signal cannot beat this after costs and redundancy checks, the fancy signal should not ship.

That is a business advantage. Not because simplicity is morally superior. Markets do not award virtue points. Simplicity wins when it reduces estimation error, improves auditability, lowers implementation risk, and performs about as well as the baroque alternative. In this paper, that is exactly the uncomfortable possibility.

The cleanest signal may be the one that leaves fewer fingerprints on history

The fashionable view in return prediction is that complexity is necessary because markets are complex. Sometimes that is true. Sometimes it is also a convenient excuse for not knowing which part of the complexity is doing the work.

Valeyre’s paper makes a narrower and more useful claim: in medium-frequency cross-asset futures trend following, a one-timescale normalized EMA inside Agnostic Risk Parity fits both theory and empirical evidence surprisingly well. The MACD variants tested do not add much. Bollinger-band mixtures can replicate the simple signal. Correlations among indicator strategies are high. The optimal region is broad enough to be operationally plausible.

The paper does not end the debate on trend-following design. It raises the entry fee. From here, a complex signal should not be accepted because it looks diversified, uses more horizons, or sounds sophisticated in an investor letter. It should be accepted because it beats the simple EMA baseline on net performance, independent evidence, lower risk, or better governance.

That is a useful kind of austerity. Fewer knobs. Fewer stories. Fewer ways to fall in love with a backtest.

Sometimes the best signal is not the one with the most moving parts. It is the one that moves just enough.

Cognaptus: Automate the Present, Incubate the Future.

Sebastien Valeyre, “Breaking the Trend: How to Avoid Cherry-Picked Signals,” arXiv:2504.10914. The article refers to the arXiv full text/PDF version available at the time of drafting, including the empirical sections, appendix tables, and figures. ↩︎
Denis Grebenkov and J. Serror, “Following a trend with an exponential moving average: Analytical results for a Gaussian model,” Physica A: Statistical Mechanics and its Applications, 394, 288-303, 2014. ↩︎
R. Benichou, Y. Lempérière, E. Sérié, J. Kockelkoren, P. Seager, J.-P. Bouchaud, and M. Potters, “Agnostic Risk Parity: Taming Known and Unknown-Unknowns,” Journal of Investment Strategies, 6(3), 1-12, 2017. ↩︎
C. Schmidhuber, “Trends, reversion, and critical phenomena in financial markets,” Physica A: Statistical Mechanics and its Applications, 566, 125642, 2021. ↩︎

TL;DR for operators#

The signal zoo exists because the mechanism is usually missing#

The model says trend is a small autocorrelated bias, not a collection of folklore#

The empirical design tests a curve, not just a leaderboard#

The 112-day optimum is not a magic number, which is good#

MACD does not buy much extra memory in this setup#

Bollinger bands become a very elaborate EMA costume#

ARP matters because the portfolio is part of the measurement instrument#

What the paper directly shows#

What Cognaptus infers for business use#

The boundaries are practical, not decorative#

The governance lesson: complexity needs a burden of proof#

The cleanest signal may be the one that leaves fewer fingerprints on history#