Signals & Sentiments: How GPT-2 and FinBERT Beat Buy-and-Hold on the S&P 500

When it comes to trading the S&P 500, tradition says: trust the chart. But a new study from UCLA researchers proposes a smarter compass—one that listens not only to price momentum but also to the tone of the news. By merging language model-powered sentiment scores with technical indicators and time-series forecasting, the authors build a hybrid strategy that outperforms a buy-and-hold baseline during a volatile 3-month window.

Beyond the Chart: A New Kind of Alpha

Classic models like ARIMA and ETS capture trends from historical prices, and technical tools like MACD and SAR reflect market inertia and reversals. Yet these models struggle in environments driven by rapid shifts in investor psychology—something textual sentiment captures more nimbly. Enter GPT-2 and FinBERT, deployed here not to generate prose, but to extract daily mood from financial news across five major outlets.

The study doesn’t stop at measuring sentiment. It embeds these signals into a full trading simulator that dynamically buys or sells S&P 500 shares based on a combined indicator set. Returns are then benchmarked against the default wisdom of simply holding.

The Set-Up: Sentiment + Signals + Stats

Here’s a breakdown of the model inputs and methodologies:

Component	Tools/Models Used	Purpose
Sentiment Models	GPT-2 (fine-tuned), FinBERT	Assign +1, 0, -1 to daily news tone
Technical Indicators	MACD, VW MACD, Dual MACD, SAR	Identify momentum, trend reversals
Time-Series Models	ARIMA, ETS, Prophet	Forecast short-term price movement
Data	S&P 500 daily prices, news from WSJ, Benzinga, Dow Jones, etc.	Input for signals & sentiment

To simulate trading, they applied a rule-based system that combines these features into a unified indicator $I_t$. If $I_t > 0$, the model buys; if $I_t < 0$, it sells; otherwise, it holds. The simulation assumes zero transaction costs and starts with $10,000 in capital.

The Results: Small Window, Big Differences

While the backtest only spans May to August 2024, the performance differentials are notable:

The buy-and-hold baseline loses 0.70%.
The best hybrid model (GPT-2 + VW MACD + Dow Jones news) earns +5.77%.
FinBERT shines on Benzinga with Dual MACD (+4.64%), but GPT-2 outperforms overall.

Interestingly, time-series models like Prophet and ETS score higher on prediction accuracy (near 60%) but don’t lead to the best trading returns unless fused with sentiment. Technical indicators alone perform poorly (<10% accuracy), but their value resurfaces when combined with FinBERT or GPT-2.

Why It Matters

This paper is not the first to combine textual sentiment with price data, but it does offer two distinctive takeaways:

Model Pairing Matters: FinBERT beats GPT-2 in sentiment accuracy on several news sources, but GPT-2 excels in generating profitable trades when fused with specific technical indicators.
News Source Selection Isn’t Trivial: Dow Jones and Benzinga yield stronger predictive sentiment signals than Barron or WSJ in this setup. Tailoring model + source pairs may be more important than previously assumed.

Limitations and the Road Ahead

While the gains are promising, the study admits several simplifications:

The test window is short (3 months) and avoids turbulent macro cycles.
No trading fees or slippage are included.
News articles are weighted equally regardless of length or impact.

Still, the authors outline clear next steps: integrating real-time social media feeds, using reinforcement learning to adapt strategies, and extending the system to other assets.

Final Thoughts

For quant shops and algo traders, this paper offers compelling evidence that large language models are not just natural language tools—they are actionable alpha sources. By transforming words into signals and blending them with classical models, traders can build portfolios that adapt, respond, and even anticipate market moves with greater agility.

Cognaptus: Automate the Present, Incubate the Future

Beyond the Chart: A New Kind of Alpha#

The Set-Up: Sentiment + Signals + Stats#

The Results: Small Window, Big Differences#

Why It Matters#

Limitations and the Road Ahead#

Final Thoughts#