In financial sentiment analysis, the devil has always been in the labeling. Most datasets — even the industry-standard Financial-Phrasebank — ask human annotators to tag headlines as positive, negative, or neutral. But here’s the problem: the market often disagrees.
Take a headline reporting widening losses. Annotators marked it “negative.” Yet the stock rose the next day. Welcome to the disconnect.
Enter FinMarBa, a bold new dataset that cuts out the middleman — the human — and lets the market itself do the labeling. Developed by Lefort et al. (2025), this 61,252-item dataset uses next-day price reactions to classify financial news, creating a labeling method that is empirically grounded, scalable, and (critically) aligned with investor behavior.
Label It Like You Mean It: The FinMarBa Pipeline
Here’s how it works:
Step | Action | Tool |
---|---|---|
1 | Collect Bloomberg Market Wraps (2010–2024) | Bloomberg |
2 | Extract daily headlines from summaries | GPT-4 |
3 | Identify relevant tickers per headline | GPT-4 |
4 | Compute next-day % return for each ticker | Market data |
5 | Compare to historical quantiles (5-year rolling) | Statistical filter |
6 | Label as Positive (>60th %ile), Negative (<30th %ile), Neutral otherwise | Market-driven rule |
No more guessing how “bad” a loss sounds. If the market rewards it, it’s labeled positive.
A Better Classifier Needs a Better Dataset
To test their theory, the authors trained two sentiment models:
- FinMarBaBERT: Trained on FinMarBa headlines and labels
- FinBERT: Trained on Financial-Phrasebank
Then they backtested both signals on the S&P 500 from 2019–2024. The result?
- FinMarBaBERT Sharpe Ratio: 0.30
- FinBERT Sharpe Ratio: -0.13
In financial terms, this is night and day. A positive Sharpe means alpha. A negative one means you’re being misled by noise.
FinMarBa’s sentiment labels also reflect the natural optimism bias of equities:
Label | FinMarBa (%) | Phrasebank (%) |
---|---|---|
Positive | 42.11 | 28.13 |
Negative | 31.43 | 12.46 |
Indecisive | 26.45 | 59.41 |
Robustness You Can Trust
The authors went further, running forward-looking perturbation tests by shuffling headlines within 5–15 day windows. As more future information leaked into the data, FinMarBa’s signal improved — proof that the model was capturing real, directional market-relevant information.
The FinBERT-based model, by contrast, just wobbled.
Window Size | 50% Future Info – Sharpe Gain |
---|---|
5 days | +1.94 |
10 days | +0.62 |
15 days | +0.39 |
FinMarBa isn’t just more predictive — it’s more resilient.
Why This Matters
Most finance LLMs are still trained on human-labeled data. But as models get stronger, the bottleneck shifts to the quality of supervision. FinMarBa’s innovation isn’t a model — it’s a new truth signal for the financial world.
By using price reaction as ground truth, it offers:
- A scalable annotation framework
- Cross-market generality (equities, bonds, commodities, crypto)
- Alignment with real investor behavior
This is a dataset not just for researchers, but for quants, asset managers, and financial AI builders who want their models to trade — not just parse.
You can grab the dataset here and the fine-tuned model here.
Cognaptus: Automate the Present, Incubate the Future