Stacking Alpha: How HARLF's Three-Tier Reinforcement Learner Beats the Market

The idea of merging language models and financial algorithms isn’t new — but HARLF takes it a step further by embedding them in a hierarchical reinforcement learning (HRL) framework that actually delivers. With a stunning 26% annualized ROI and a Sharpe ratio of 1.2, this isn’t just another LLM-meets-finance paper. It’s a blueprint for how sentiment and structure can be synergistically harnessed.

From FinBERT to Fortune: Integrating Text with Tickers

Most financial LLM pipelines stop at score generation: classify sentiment and call it a signal. But HARLF builds a full sentiment pipeline using FinBERT, generating monthly sentiment scores from scraped Google News articles for each of 14 assets. These scores aren’t just inputs — they form a complete observation vector that includes:

Volatility (std. dev of returns)
FinBERT-derived sentiment scores

This NLP signal is not bolted on; it’s fused at the root with market data like Sharpe, Sortino, Calmar ratios, drawdowns, and correlation matrices to create hybrid state vectors.

Why Reinforcement Learning (RL) — and Why Hierarchy?

Reinforcement Learning thrives in environments with feedback and time dynamics — ideal for trading. But HARLF elevates RL with a three-tier architecture:

Tier	Agent Type	Role
1	Base RL Agents	Trained on either market data or sentiment data separately
2	Meta-Agents	Learn to combine outputs of base agents
3	Super-Agent	Learns final portfolio decision from both meta-agent outputs

This hierarchy mirrors how real investment firms operate:

Analysts (base agents) specialize in data domains.
Portfolio managers (meta-agents) synthesize analysts’ inputs.
CIO (super-agent) makes the final capital allocation.

This structure promotes modular specialization and interpretability, while reducing the risk of unstable overfitting in single-shot agents.

Quantified Results: Beating Benchmarks with Brains

All models were trained on 2003–2017 and backtested on 2018–2024. Here’s how they stacked up:

Performance Overview (2018–2024)

Strategy	ROI (%)	Sharpe	Volatility (%)
Equal-Weight Baseline	7.5	0.57	13.3
S&P 500	13.2	0.63	19.7
Meta-Agent (Metrics)	14.7	0.8	16.0
Meta-Agent (NLP)	20.5	1.2	16.0
Super-Agent	26.0	1.2	20.0

The standout here is not just the super-agent, but how NLP-powered meta-agents outperform purely market-driven ones, confirming that textual sentiment adds real alpha when integrated properly.

Engineering Highlights

Stable Baselines 3 was used for RL implementations (PPO, TD3, SAC, DDPG).
A custom PyTorch environment aggregates actions from agents into observation vectors.
Rewards = ROI minus penalties for volatility and drawdown: Reward = a1*ROI - a2*MDD - a3*σ
Fully reproducible via three Colab notebooks: sentiment extraction, simulation, and full pipeline.

Constraints Keep It Real

HARLF makes realistic assumptions for its portfolios:

Long-only: No short-selling risk.
No leverage: Capital invested only.
Monthly rebalancing: Pragmatic trading cadence.
Equal initial weights: Clean starting point for learning.

This isn’t theoretical finance in a vacuum — it’s simulation grounded in reality.

Why This Matters for Financial AI

HARLF is compelling not just because it outperforms benchmarks, but because it does so with:

Transparent architecture (hierarchical decisions)
Open-source tooling (Colab ready)
Low-latency NLP (FinBERT vs GPT-scale models)

The paper reinforces a critical takeaway for FinLLM development: you don’t need a 70B parameter model to outperform the market — you need a well-designed system that integrates the right data with the right structure.

Beyond the Paper: Where to Go From Here

To push HARLF’s architecture further, future iterations could explore:

Asynchronous data inputs (news often breaks after prices move)
Transaction cost modeling
More diverse NLP signals (social media, earnings calls, SEC filings)
Asset expansion to options/futures

But even now, HARLF sets a new bar for what FinLLM + RL integration should look like.

Cognaptus: Automate the Present, Incubate the Future

From FinBERT to Fortune: Integrating Text with Tickers#

Why Reinforcement Learning (RL) — and Why Hierarchy?#

Quantified Results: Beating Benchmarks with Brains#

Performance Overview (2018–2024)#

Engineering Highlights#

Constraints Keep It Real#

Why This Matters for Financial AI#

Beyond the Paper: Where to Go From Here#