TL;DR for operators
The paper behind this article is not a victory lap for AI stock prediction. It is a more useful thing: a controlled comparison of how different optimisers behave when a MambaStock model tries to forecast one-week-ahead S&P 500 returns.1
The operational read is simple. If your priority is lowest forecast error in this setup, the safer family is still adaptive or momentum-based optimisation: RMSProp, Adam, Nesterov, and SGD with momentum. If your priority is fast experimentation across many hyperparameter settings, Lion deserves attention because it trains quickly and tolerates a broader region of settings. If your priority is Lion-like speed without quite so much convergence thrashing, Roaree is interesting: it smooths Lion’s hard sign update and improves Lion’s test error and training stability.
The catch, because naturally there is one, is that the paper does not show a profitable trading strategy. It reports prediction metrics on a weekly S&P 500 return task using a fixed MambaStock architecture. There is no live execution, no transaction-cost model, no portfolio construction layer, no slippage, and no regime-by-regime trading economics. So the correct business takeaway is not “deploy Roaree and invoice the market for its generosity.” It is: optimiser choice can materially affect research throughput, convergence behaviour, and forecast-error baselines in financial sequence models.
The trading floor problem is not only the model
A quant team can spend months debating architectures. Transformer or state-space model. LSTM or Mamba. Technical indicators or learned embeddings. Sentiment features or no sentiment features. The discussion usually sounds sophisticated, because architecture debates come with diagrams, memory curves, and enough Greek letters to discourage weak participants.
But in day-to-day model development, a quieter decision often shapes the result: how the model is trained. The optimiser decides how parameters move after each gradient update. In stable image or language tasks, that may sound like plumbing. In financial return prediction, where the target signal is small, noisy, and frequently rude, the optimiser can become the difference between a model that gradually settles and one that ricochets around the loss surface like a caffeinated intern with Bloomberg access.
That is the useful frame for From Rattle to Roar: Optimizer Showdown for MambaStock on S&P 500. The authors take MambaStock, hold the architecture fixed, and vary the optimiser. This is not an architecture paper pretending to be a trading paper. It is an optimisation paper using a trading-relevant forecasting task as the stress test.
That distinction matters. When financial AI papers start with “we predict stock returns,” readers often sprint directly to alpha fantasies. This one is better read as a decision guide for model builders: when the target is weak and the sequence model is sensitive, which optimiser gives the best trade-off among error, speed, stability, and tuning tolerance?
The experiment isolates the optimiser, which is the point
The study uses weekly S&P 500 observations from 2000 to 2019. The features include current returns, ten technical indicators, three valuation ratios, and two sentiment scores. The target is the forward one-week return. The final 100 weeks are held out as the test set; the remaining data are split into training and validation portions. The authors also state that the split is causal, so the model is not allowed to peek into the future. A rare and welcome courtesy to reality.
The model is MambaStock with hidden size 64 and two layers. No architectural changes are made. That design choice makes the experiment cleaner: the optimiser is the experimental variable.
The baseline optimisers are:
| Optimiser family | Examples in the paper | Practical intuition |
|---|---|---|
| Plain gradient descent | SGD | Simple, but often fragile when gradients are noisy or badly scaled |
| Momentum methods | SGD with momentum, Nesterov | Smooth updates over time, reducing some noise in the gradient path |
| Adaptive-rate methods | RMSProp, Adam, AdamW, Adagrad | Adjust learning rates across parameters, useful when gradient magnitudes vary |
| Sign-based momentum | Lion | Fast and memory-efficient, but the hard sign update can overshoot |
| Smooth-sign Lion variants | Roaree | Replace Lion’s hard sign with smoother approximations |
All optimisers are trained for 64 epochs with identical random seeds. The baseline optimisers are searched over a broader grid of learning rates and weight decay values. Roaree uses a narrower grid so that the authors can explore six smooth surrogate functions and three curvature values.
That grid design is important. The broad baseline grid answers, “What can established optimisers do when given room to tune?” The matched smaller grid answers, “How do Roaree variants compare when the search space is constrained to the same region?” Those are related questions, not identical ones. Confusing them is how benchmarking turns into spreadsheet theatre.
The main comparison: Adam-style and momentum methods still own the error column
The headline result is not that Roaree beats everything. It does not.
Across the baseline optimisers, the lowest test errors come from Nesterov, RMSProp, Adam, and SGD with momentum. This is exactly the result one would expect if the hard part of the task is not raw computation but noisy gradient navigation. Weekly returns are small. The useful signal is thin. Gradients inside a Mamba block can vary substantially across parameters and layers. Optimisers that either smooth gradients over time or adapt learning rates parameter by parameter are better equipped for that mess.
The paper’s interpretation is sensible: momentum helps damp short-term noise, while RMS-style adaptive methods handle uneven gradient scale. Adam performs well because it combines moment estimates with adaptive scaling. RMSProp also performs strongly. Nesterov and SGD with momentum do well because they impose temporal discipline on updates rather than chasing each gradient twitch.
AdamW is a more interesting case. It is widely used in modern deep learning, especially transformer training, but in this task it does not rank among the top performers. The authors attribute this to decoupled weight decay being less helpful when the signal is already very small. Extra regularisation can slow convergence or shrink useful movement too aggressively. In plain English: in a weak-signal market task, “regularise harder” is not automatically wisdom. Sometimes it is just a very elegant way to underfit.
Lion wins on speed and tolerance, not final accuracy
Lion is the paper’s second important character. It is not the best error performer, but it behaves in a way that matters for teams doing large-scale experimentation.
Lion uses a sign-based momentum update. Instead of moving in proportion to the raw gradient magnitude, it pushes parameters using the sign of a momentum-like quantity. In simplified form, the update direction depends on:
where $c_t$ is built from momentum and the current gradient. The appeal is efficiency. Sign-based updates can be fast and memory-light. In the paper’s results, Lion also supports a broader range of learning-rate and weight-decay settings where validation MSE remains low. That makes it attractive for aggressive sweeps, early-stage research, and situations where the team wants to test many ideas before paying for deeper tuning.
But the same hard sign step can produce oscillatory convergence. Near an optimum, a hard sign does not naturally soften just because the model is close. It can keep taking discrete directional steps where a gentler movement would help. The paper shows Lion’s convergence oscillating more than most Roaree variants.
That creates the central trade-off:
| Choice | What it buys | What it costs |
|---|---|---|
| Adam/RMSProp-style methods | Lower forecast error in this benchmark | Less distinctive speed advantage |
| Nesterov / SGD with momentum | Strong error performance through gradient smoothing | Still requires careful tuning |
| Lion | Fast training and broad tolerance to aggressive hyperparameters | Worse final test error and oscillatory convergence |
| Roaree | More stable Lion-like training with improved Lion accuracy | Still trails the strongest adaptive baselines |
For a quant research desk, Lion is less of a final-production answer and more of a search tool. It is the optimiser you consider when the question is, “Can we explore this modelling space quickly?” not necessarily, “Is this the model we put behind capital tomorrow morning?”
Roaree smooths the bite out of Lion
Roaree is the paper’s proposed optimiser family. The idea is direct: keep Lion’s speed-oriented structure, but replace the non-differentiable hard sign function with a smooth surrogate.
The paper tests six replacements:
| Roaree surrogate | Role in the experiment | ||
|---|---|---|---|
| $\tanh(\kappa x)$ | Smooth saturation toward sign-like behaviour | ||
| $\frac{2}{\pi}\arctan(\kappa x)$ | Smooth bounded approximation | ||
| $\frac{\kappa x}{1 + | \kappa x | }$ | Softsign-style transition |
| $2\sigma(\kappa x)-1$ | Sigmoid-based smooth sign | ||
| $\operatorname{erf}(\kappa x)$ | Error-function smooth sign | ||
| $\frac{x}{\sqrt{x^2 + 1}}$ | Norm-like smooth approximation |
The curvature parameter $\kappa$ controls how close the surrogate is to the hard sign. As $\kappa$ becomes very large, the surrogate behaves more like the original sign function. With smaller $\kappa$, the update becomes smoother and more linear near zero.
This is a small technical change with a large behavioural aim. Roaree tries to reduce Lion’s overshooting near the optimum while preserving fast updates. The paper’s convergence plots support the mechanism: most Roaree variants show smoother training behaviour than Lion. The exception is the norm surrogate, which still oscillates because its useful linear region is narrow, although even there the amplitude is somewhat reduced.
The result is not magic. It is engineering. The optimiser stops shouting “left” or “right” at full volume when the model is already near a good region. That is usually a healthy development, in markets and in meetings.
The appendix makes the trade-off clearer than the headline
The paper’s appendix is not a second thesis. It is best read as a robustness and diagnostic layer around the main comparison.
| Evidence piece | Likely purpose | What it supports | What it does not prove |
|---|---|---|---|
| Figure 1: speed vs. best test MSE for baselines | Main evidence | Adaptive and momentum methods dominate final error; fastest settings are not always best | Profitability or live trading value |
| Figure 2: validation MSE heatmaps | Robustness/sensitivity test | Lion tolerates a broader hyperparameter region with low validation MSE | That Lion is best for final prediction |
| Figures 3–4: Roaree vs. baselines and convergence | Main Roaree evidence | Roaree improves on Lion and smooths convergence | That Roaree beats Adam/RMSProp overall |
| Appendix figures on MAE, RMSE, directional accuracy | Secondary diagnostics | Error patterns are not only one MSE artefact | Strategy-level returns or risk-adjusted performance |
| Appendix small-grid table | Fairer matched-grid comparison | Roaree beats Lion under the shared grid; RMSProp/Adam remain stronger | Exhaustive global optimiser ranking |
The matched-grid appendix table is especially useful because it gives actual numbers. On the smaller grid used to compare Roaree with baselines, RMSProp records the lowest test MSE at 0.0002297, followed by Adam at 0.0003021 and Nesterov at 0.0003165. Lion, by contrast, records 0.003588. Roaree variants improve materially over Lion, with the sigmoid surrogate at $\kappa=1000$ reaching 0.0008159 and the erf surrogate at $\kappa=10$ reaching 0.001508.
A small but important nuance: the paper describes the erf surrogate with $\kappa=10$ as the best surrogate choice because it preserves extremely low epoch time while decreasing test MSE relative to Lion. If the criterion is lowest Roaree test MSE alone in the appendix table, sigmoid with $\kappa=1000$ is lower. So “best” here should be read as a speed-error preference, not a universal accuracy crown.
| Configuration on matched small grid | Test MSE | Avg. epoch time | Business reading |
|---|---|---|---|
| RMSProp | 0.0002297 | 0.2492s | Best small-grid error; strong candidate when accuracy dominates |
| Adam | 0.0003021 | 0.2448s | Very strong error with fast epoch time |
| Nesterov | 0.0003165 | 0.2483s | Momentum remains highly competitive |
| SGD + momentum | 0.0004946 | 0.2468s | Solid middle ground |
| Roaree sigmoid, $\kappa=1000$ | 0.0008159 | 0.2512s | Best Roaree by matched-grid MSE |
| Roaree erf, $\kappa=10$ | 0.001508 | 0.2451s | Paper’s speed-oriented Roaree pick |
| Lion | 0.003588 | 0.2452s | Fast, but weak final error on this grid |
| SGD | 0.003926 | 0.2526s | Simple, but not persuasive here |
The times also deserve sober interpretation. These are fractions of a second per epoch on an NVIDIA T4 in Google Colab. In absolute terms, the differences are tiny. They matter only if multiplied across many assets, many windows, many model variants, many retraining cycles, or a very large research operation. At that point, quarter-second differences can become workflow economics. Below that scale, they are mostly a reminder not to confuse decimal places with destiny.
What this means for quant teams
The practical implication is not “choose Roaree.” It is “choose the optimiser according to the bottleneck.”
For teams optimising a mature model where each basis point of forecast error matters, the paper points back to established adaptive and momentum methods. RMSProp, Adam, Nesterov, and SGD with momentum are the stronger candidates in this specific benchmark. They should be the default comparison set before a team gets excited about a new optimiser name. New names are fun. Lower error is more fun.
For teams doing broad research sweeps, Lion has a different value proposition. Its tolerance across hyperparameter settings means it may be useful for early exploration, architecture probes, or high-volume hypothesis testing. A team can use it to move quickly through weak ideas before spending more careful tuning budget on promising ones.
Roaree sits between those modes. It is most interesting where Lion’s speed profile is attractive but its oscillatory convergence is too noisy. Smooth-sign variants offer a plausible route to more stable fast training. That makes Roaree less a replacement for Adam and more a repair kit for Lion-style optimisation.
The workflow implication looks like this:
| Research stage | Sensible optimiser posture | Reason |
|---|---|---|
| Early model exploration | Lion or selected Roaree variants | Fast iteration and broader hyperparameter tolerance |
| Candidate model refinement | Adam, RMSProp, Nesterov, SGD + momentum | Better final error in the benchmark |
| Stability diagnosis | Compare Lion against Roaree convergence curves | Tests whether hard-sign oscillation is harming convergence |
| Production validation | Re-test all candidates under walk-forward trading constraints | Prediction error alone is not deployable alpha |
This is the kind of result that belongs inside an ML research platform. The optimiser should not be hard-coded as an afterthought. It should be part of the experiment matrix, logged alongside architecture, features, split regime, costs, and downstream trading assumptions. Otherwise, a team may conclude that a model architecture failed when, in reality, the optimiser simply trained it badly.
What the paper shows, what we infer, and what remains uncertain
The cleanest way to use this paper is to separate direct evidence from business inference.
| Layer | Statement |
|---|---|
| What the paper directly shows | On a MambaStock weekly S&P 500 return forecasting task, adaptive and momentum-based optimisers achieve the lowest reported errors among the main baselines. Lion trains quickly and tolerates a broader hyperparameter region but does not achieve the best test error. Roaree variants smooth Lion-like convergence and improve over Lion. |
| What Cognaptus infers for business use | Optimiser selection is a research-throughput and stability lever for financial ML teams. Adam/RMSProp/Nesterov-style methods should remain strong default baselines when error matters. Lion/Roaree-style methods may be useful for fast sweeps, diagnostic testing, and large experiment farms. |
| What remains uncertain | Whether these findings transfer to other assets, frequencies, market regimes, proprietary feature sets, larger Mamba variants, portfolio objectives, and transaction-cost-aware trading systems. |
That last row is not a ceremonial limitation paragraph. It changes how the result should be used.
A weekly S&P 500 return benchmark is a controlled forecasting task. It is not a strategy. A model can improve MSE and still fail to make money after costs. Directional accuracy can look respectable and still produce poor risk-adjusted returns if the wrong weeks are wrong, position sizing is naive, or turnover is expensive. A smoother convergence curve is useful, but it does not pay commissions.
The business value is therefore upstream. Better optimiser choice can reduce wasted training cycles, make model comparisons fairer, and expose whether a forecasting architecture is genuinely weak or merely poorly trained. That is valuable. It is just not the same thing as trading edge. Finance, inconveniently, keeps insisting on the distinction.
The boundary: do not mistake optimiser progress for market proof
The paper is careful enough to state several limitations. Detailed historical stock data are commercially constrained. Different optimiser families need different hyperparameter regimes. A broader grid improves fairness but reduces granularity. Roaree itself likely needs deeper curvature searches and alternative schedules before its ceiling is known.
There are also practical boundaries not fully resolved by the benchmark:
-
Single-market scope. The task is based on S&P 500 weekly data. Other assets may have different noise, liquidity, and regime behaviour.
-
Forecast metric scope. The reported metrics include MSE, RMSE, MAE, $R^2$, and directional accuracy. They do not include transaction costs, slippage, turnover, drawdown, capacity, or live retraining stability.
-
Architecture scope. MambaStock is held fixed. That is good for isolating optimiser effects, but it does not show whether Roaree behaves similarly across other state-space, recurrent, or transformer-style financial models.
-
Search-budget scope. Roaree is promising but not exhaustively optimised. The reported results are enough to justify further testing, not enough to declare a new standard.
These boundaries do not weaken the paper. They keep it useful. An optimiser benchmark that pretends to be an investment thesis is less valuable than one that tells model builders where to look next.
The useful conclusion is comparative, not heroic
The article’s old temptation would be to frame Roaree as a dramatic breakthrough: Lion tamed, trading floor transformed, profits presumably arriving by courier. That would be tidy. It would also be wrong.
The better conclusion is more disciplined. In this benchmark, the best error performance still belongs to adaptive and momentum-based optimisers. Lion remains attractive for speed and hyperparameter tolerance but pays for that with weaker final error and oscillatory convergence. Roaree is an intelligent modification: by smoothing Lion’s sign step, it reduces some of that instability and improves Lion-like performance, while still not overtaking the strongest baseline methods.
For quant teams, the lesson is operational. Treat optimiser choice as part of the research design. Use fast methods to explore, stable adaptive methods to refine, and matched-grid comparisons to avoid fooling yourself with unfair tuning budgets. Then validate everything again under real trading constraints, because markets are not impressed by clean loss curves. They have seen prettier charts than yours.
Cognaptus: Automate the Present, Incubate the Future.
-
Maria Garmonina and Alena Chan, “From Rattle to Roar: Optimizer Showdown for MambaStock on S&P 500,” arXiv:2508.04707, 2025. ↩︎