In the world of quantitative investing, the line between data and story has long been clear. Numbers ruled the models; narratives belonged to the analysts. But the recent paper “Exploring the Synergy of Quantitative Factors and Newsflow Representations from Large Language Models for Stock Return Prediction” from RAM Active Investments argues that this divide is no longer useful—or profitable.

Beyond Factors: Why Text Matters

Quantitative factors—valuation, momentum, profitability—are the pillars of systematic investing. They measure what can be counted. But markets move on what’s talked about, too. Corporate press releases, analyst notes, executive reshuffles—all carry signals that often precede price action. Historically, this qualitative layer was hard to quantify. Now, LLMs can translate the market’s chatter into vectors of meaning.

The study tests whether fusing these LLM-generated news embeddings with traditional factors improves return predictions. Instead of simply appending text-derived features to a factor model, the researchers build fusion architectures—structured systems that learn unified representations across modalities.

Fusion vs. Mixture: Two Ways to Marry Numbers and Narratives

The team compares two paradigms:

Model Type Core Idea Strengths Weaknesses
Fusion Learning Combines factor and news embeddings at the representation level (via concatenation, summation, or attention). Captures joint signals between modalities. Can dilute strong factor signals when news is noisy.
Mixture Model Blends the outputs of single-modality predictors, adaptively weighting their contributions. Allows flexibility when one source dominates (e.g., factors in efficient markets, news in emerging ones). Training instability due to entangled gradients.

Their key innovation is a decoupled training approach that stabilizes mixture models. By first training each component independently and then aligning their output probabilities through KL-divergence, the system learns when to trust factors and when to trust news.

The Quiet Power of Simplicity

Interestingly, the best performer isn’t the most complex. Among the fusion methods, simple concatenation (representation combination) outperforms more sophisticated attention mechanisms. In noisy financial environments, subtle models tend to overfit; straightforward architectures retain robustness. The lesson echoes decades of quantitative wisdom: elegance often beats cleverness.

Numbers That Talk Back

In backtests across North American, European, and Emerging Market universes, the models revealed market-specific behavior:

  • North America: Factors remain dominant; fine-tuning the LLM hurts performance, likely because U.S. markets price information faster.
  • Europe & Emerging Markets: News embeddings add real alpha. Here, fine-tuning helps the LLM adapt to idiosyncratic narratives and underreported events.

In the best cases, the Mixture Decoupled model achieved 33.8% annualized return with a Sharpe ratio of 1.78, outperforming both standalone factor and news models.

Metrics That Mislead

The authors also expose a crucial evaluation trap: low MAPE (mean absolute percentage error) doesn’t necessarily mean better portfolios. Predictive accuracy isn’t the same as ranking power. Instead, the Information Coefficient (IC)—the correlation between predicted and realized returns—proves a more reliable indicator. Even a small positive IC (≈0.03) can translate into substantial alpha when scaled across thousands of stocks.

A Broader Shift in Quant Mindset

This research signals a philosophical shift: LLMs aren’t just for sentiment tagging—they’re economic sensors capable of encoding soft information about firms, sectors, and regimes. The challenge is not to let them drown out the hard signals but to teach models to arbitrate between the two.

In other words, the future of quant investing isn’t about choosing between fundamentals and language—it’s about context-aware synthesis. When numbers and narratives talk to each other, markets become a little less opaque—and portfolios a little more prescient.


Cognaptus: Automate the Present, Incubate the Future.