The idea of merging language models and financial algorithms isn’t new — but HARLF takes it a step further by embedding them in a hierarchical reinforcement learning (HRL) framework that actually delivers. With a stunning 26% annualized ROI and a Sharpe ratio of 1.2, this isn’t just another LLM-meets-finance paper. It’s a blueprint for how sentiment and structure can be synergistically harnessed.

From FinBERT to Fortune: Integrating Text with Tickers

Most financial LLM pipelines stop at score generation: classify sentiment and call it a signal. But HARLF builds a full sentiment pipeline using FinBERT, generating monthly sentiment scores from scraped Google News articles for each of 14 assets. These scores aren’t just inputs — they form a complete observation vector that includes:

  • Volatility (std. dev of returns)
  • FinBERT-derived sentiment scores

This NLP signal is not bolted on; it’s fused at the root with market data like Sharpe, Sortino, Calmar ratios, drawdowns, and correlation matrices to create hybrid state vectors.

Why Reinforcement Learning (RL) — and Why Hierarchy?

Reinforcement Learning thrives in environments with feedback and time dynamics — ideal for trading. But HARLF elevates RL with a three-tier architecture:

Tier Agent Type Role
1 Base RL Agents Trained on either market data or sentiment data separately
2 Meta-Agents Learn to combine outputs of base agents
3 Super-Agent Learns final portfolio decision from both meta-agent outputs

This hierarchy mirrors how real investment firms operate:

  • Analysts (base agents) specialize in data domains.
  • Portfolio managers (meta-agents) synthesize analysts’ inputs.
  • CIO (super-agent) makes the final capital allocation.

This structure promotes modular specialization and interpretability, while reducing the risk of unstable overfitting in single-shot agents.

Quantified Results: Beating Benchmarks with Brains

All models were trained on 2003–2017 and backtested on 2018–2024. Here’s how they stacked up:

Performance Overview (2018–2024)

Strategy ROI (%) Sharpe Volatility (%)
Equal-Weight Baseline 7.5 0.57 13.3
S&P 500 13.2 0.63 19.7
Meta-Agent (Metrics) 14.7 0.8 16.0
Meta-Agent (NLP) 20.5 1.2 16.0
Super-Agent 26.0 1.2 20.0

The standout here is not just the super-agent, but how NLP-powered meta-agents outperform purely market-driven ones, confirming that textual sentiment adds real alpha when integrated properly.

Engineering Highlights

  • Stable Baselines 3 was used for RL implementations (PPO, TD3, SAC, DDPG).
  • A custom PyTorch environment aggregates actions from agents into observation vectors.
  • Rewards = ROI minus penalties for volatility and drawdown: Reward = a1*ROI - a2*MDD - a3*σ
  • Fully reproducible via three Colab notebooks: sentiment extraction, simulation, and full pipeline.

Constraints Keep It Real

HARLF makes realistic assumptions for its portfolios:

  • Long-only: No short-selling risk.
  • No leverage: Capital invested only.
  • Monthly rebalancing: Pragmatic trading cadence.
  • Equal initial weights: Clean starting point for learning.

This isn’t theoretical finance in a vacuum — it’s simulation grounded in reality.

Why This Matters for Financial AI

HARLF is compelling not just because it outperforms benchmarks, but because it does so with:

  • Transparent architecture (hierarchical decisions)
  • Open-source tooling (Colab ready)
  • Low-latency NLP (FinBERT vs GPT-scale models)

The paper reinforces a critical takeaway for FinLLM development: you don’t need a 70B parameter model to outperform the market — you need a well-designed system that integrates the right data with the right structure.

Beyond the Paper: Where to Go From Here

To push HARLF’s architecture further, future iterations could explore:

  • Asynchronous data inputs (news often breaks after prices move)
  • Transaction cost modeling
  • More diverse NLP signals (social media, earnings calls, SEC filings)
  • Asset expansion to options/futures

But even now, HARLF sets a new bar for what FinLLM + RL integration should look like.


Cognaptus: Automate the Present, Incubate the Future