Financial NLP

When the Market Speaks: A New Dataset That Actually Listens

TL;DR for operators FinMarBa is a useful reminder that in finance, sentiment is not what a sentence sounds like. Sentiment is what the market does after reading, absorbing, ignoring, overreacting to, or misunderstanding that sentence. Very elegant. Very inconvenient. The paper introduces a 61,252-headline financial sentiment dataset built from Bloomberg Market Wraps covering 2010 to January 2024.1 Instead of asking human annotators whether a headline feels positive, negative, or neutral, the authors use a market-based labelling process: extract headlines, identify relevant tickers with GPT-4, observe the next-day price reaction, compare that reaction with the ticker’s rolling five-year return distribution, and assign a label from that relative move. ...

🚀 All Talk, No Stocks? What Reddit Sentiment Doesn't Predict

TL;DR for operators A new paper by Mateusz Kmak and colleagues asks a very practical question: can Reddit sentiment, especially when annotated with ChatGPT and fed into a fine-tuned Financial-RoBERTa model, predict meme-stock prices?1 The short answer is: not very well. Which is awkward, because the whole exercise starts from the obvious temptation that if Reddit can help move a stock, then Reddit sentiment should help forecast it. Markets, naturally, have declined to be that tidy. ...

Numbers Don’t Speak for Themselves: How LLMs Interpret the Soul of Financial Reports

TL;DR for operators Financial-report analysis is one of those jobs where the output can sound competent long before it is useful. A model can summarise a 10-K fluently, mention strategy, risk, customers, and competitive position, and still fail the only test that matters: can a finance team rely on it repeatedly, under pressure, across filings? ...