The idea of merging language models and financial algorithms isn’t new — but HARLF takes it a step further by embedding them in a hierarchical reinforcement learning (HRL) framework that actually delivers. With a stunning 26% annualized ROI and a Sharpe ratio of 1.2, this isn’t just another LLM-meets-finance paper. It’s a blueprint for how sentiment and structure can be synergistically harnessed.
From FinBERT to Fortune: Integrating Text with Tickers
Most financial LLM pipelines stop at score generation: classify sentiment and call it a signal. But HARLF builds a full sentiment pipeline using FinBERT, generating monthly sentiment scores from scraped Google News articles for each of 14 assets. These scores aren’t just inputs — they form a complete observation vector that includes:
- Volatility (std. dev of returns)
- FinBERT-derived sentiment scores
This NLP signal is not bolted on; it’s fused at the root with market data like Sharpe, Sortino, Calmar ratios, drawdowns, and correlation matrices to create hybrid state vectors.
Why Reinforcement Learning (RL) — and Why Hierarchy?
Reinforcement Learning thrives in environments with feedback and time dynamics — ideal for trading. But HARLF elevates RL with a three-tier architecture:
Tier | Agent Type | Role |
---|---|---|
1 | Base RL Agents | Trained on either market data or sentiment data separately |
2 | Meta-Agents | Learn to combine outputs of base agents |
3 | Super-Agent | Learns final portfolio decision from both meta-agent outputs |
This hierarchy mirrors how real investment firms operate:
- Analysts (base agents) specialize in data domains.
- Portfolio managers (meta-agents) synthesize analysts’ inputs.
- CIO (super-agent) makes the final capital allocation.
This structure promotes modular specialization and interpretability, while reducing the risk of unstable overfitting in single-shot agents.
Quantified Results: Beating Benchmarks with Brains
All models were trained on 2003–2017 and backtested on 2018–2024. Here’s how they stacked up:
Performance Overview (2018–2024)
Strategy | ROI (%) | Sharpe | Volatility (%) |
---|---|---|---|
Equal-Weight Baseline | 7.5 | 0.57 | 13.3 |
S&P 500 | 13.2 | 0.63 | 19.7 |
Meta-Agent (Metrics) | 14.7 | 0.8 | 16.0 |
Meta-Agent (NLP) | 20.5 | 1.2 | 16.0 |
Super-Agent | 26.0 | 1.2 | 20.0 |
The standout here is not just the super-agent, but how NLP-powered meta-agents outperform purely market-driven ones, confirming that textual sentiment adds real alpha when integrated properly.
Engineering Highlights
- Stable Baselines 3 was used for RL implementations (PPO, TD3, SAC, DDPG).
- A custom PyTorch environment aggregates actions from agents into observation vectors.
- Rewards = ROI minus penalties for volatility and drawdown:
Reward = a1*ROI - a2*MDD - a3*σ
- Fully reproducible via three Colab notebooks: sentiment extraction, simulation, and full pipeline.
Constraints Keep It Real
HARLF makes realistic assumptions for its portfolios:
- Long-only: No short-selling risk.
- No leverage: Capital invested only.
- Monthly rebalancing: Pragmatic trading cadence.
- Equal initial weights: Clean starting point for learning.
This isn’t theoretical finance in a vacuum — it’s simulation grounded in reality.
Why This Matters for Financial AI
HARLF is compelling not just because it outperforms benchmarks, but because it does so with:
- Transparent architecture (hierarchical decisions)
- Open-source tooling (Colab ready)
- Low-latency NLP (FinBERT vs GPT-scale models)
The paper reinforces a critical takeaway for FinLLM development: you don’t need a 70B parameter model to outperform the market — you need a well-designed system that integrates the right data with the right structure.
Beyond the Paper: Where to Go From Here
To push HARLF’s architecture further, future iterations could explore:
- Asynchronous data inputs (news often breaks after prices move)
- Transaction cost modeling
- More diverse NLP signals (social media, earnings calls, SEC filings)
- Asset expansion to options/futures
But even now, HARLF sets a new bar for what FinLLM + RL integration should look like.
Cognaptus: Automate the Present, Incubate the Future