In financial sentiment analysis, the devil has always been in the labeling. Most datasets — even the industry-standard Financial-Phrasebank — ask human annotators to tag headlines as positive, negative, or neutral. But here’s the problem: the market often disagrees.

Take a headline reporting widening losses. Annotators marked it “negative.” Yet the stock rose the next day. Welcome to the disconnect.

Enter FinMarBa, a bold new dataset that cuts out the middleman — the human — and lets the market itself do the labeling. Developed by Lefort et al. (2025), this 61,252-item dataset uses next-day price reactions to classify financial news, creating a labeling method that is empirically grounded, scalable, and (critically) aligned with investor behavior.

Label It Like You Mean It: The FinMarBa Pipeline

Here’s how it works:

Step Action Tool
1 Collect Bloomberg Market Wraps (2010–2024) Bloomberg
2 Extract daily headlines from summaries GPT-4
3 Identify relevant tickers per headline GPT-4
4 Compute next-day % return for each ticker Market data
5 Compare to historical quantiles (5-year rolling) Statistical filter
6 Label as Positive (>60th %ile), Negative (<30th %ile), Neutral otherwise Market-driven rule

No more guessing how “bad” a loss sounds. If the market rewards it, it’s labeled positive.

A Better Classifier Needs a Better Dataset

To test their theory, the authors trained two sentiment models:

  • FinMarBaBERT: Trained on FinMarBa headlines and labels
  • FinBERT: Trained on Financial-Phrasebank

Then they backtested both signals on the S&P 500 from 2019–2024. The result?

  • FinMarBaBERT Sharpe Ratio: 0.30
  • FinBERT Sharpe Ratio: -0.13

In financial terms, this is night and day. A positive Sharpe means alpha. A negative one means you’re being misled by noise.

FinMarBa’s sentiment labels also reflect the natural optimism bias of equities:

Label FinMarBa (%) Phrasebank (%)
Positive 42.11 28.13
Negative 31.43 12.46
Indecisive 26.45 59.41

Robustness You Can Trust

The authors went further, running forward-looking perturbation tests by shuffling headlines within 5–15 day windows. As more future information leaked into the data, FinMarBa’s signal improved — proof that the model was capturing real, directional market-relevant information.

The FinBERT-based model, by contrast, just wobbled.

Window Size 50% Future Info – Sharpe Gain
5 days +1.94
10 days +0.62
15 days +0.39

FinMarBa isn’t just more predictive — it’s more resilient.

Why This Matters

Most finance LLMs are still trained on human-labeled data. But as models get stronger, the bottleneck shifts to the quality of supervision. FinMarBa’s innovation isn’t a model — it’s a new truth signal for the financial world.

By using price reaction as ground truth, it offers:

  • A scalable annotation framework
  • Cross-market generality (equities, bonds, commodities, crypto)
  • Alignment with real investor behavior

This is a dataset not just for researchers, but for quants, asset managers, and financial AI builders who want their models to trade — not just parse.

You can grab the dataset here and the fine-tuned model here.


Cognaptus: Automate the Present, Incubate the Future