Factor Factory: How LLMs Are Reinventing Sparse Portfolio Optimization

In quantitative finance, sparse portfolio optimization is a famously unforgiving problem. Selecting the top m assets from a universe of n under budget and risk constraints is NP-hard, highly sensitive to hyperparameters, and often brittle in volatile markets. Traditional solutions—from greedy algorithms to convex relaxations—either crumble under market shifts or produce opaque, overfitted outputs.

But what if we reframed the problem entirely?

Enter EFS (Evolutionary Factor Search), a radical new framework that turns sparse portfolio construction into an LLM-guided ranking game. Instead of laboriously tuning machine learning models or relying on rigid heuristics, EFS lets large language models generate, evolve, and select alpha factors—and it does so in a way that is not just automated, but interpretable, adaptive, and surprisingly effective.

From Predictive Models to Ranking Machines

EFS redefines sparse optimization as a top-m ranking task, guided by a dynamic pool of LLM-generated factors. These alpha factors—mathematical functions mapping historical prices and returns to desirability scores—are evaluated not just for their predictive accuracy but for their ability to discriminate among the best-performing assets.

The process starts with a small library of interpretable, atomic factors:

Factor Name	Description
`momentum_7`	7-day price momentum
`volatility_14`	14-day log return volatility
`rsi_14`	Relative Strength Index over 14 days
`bb_width_21`	Bollinger Band width normalized by MA

Then comes the magic: prompts are crafted with performance summaries and constraints, and LLMs generate new Python functions using structured mutations (e.g., tweak parameters) or crossovers (e.g., fuse two factors). These new factors are then evaluated, ranked, and incorporated into the evolving pool.

A Closed-Loop Evolutionary Engine

At the heart of EFS is a feedback-driven loop:

Evaluate: Rank factors by performance metrics like RankIC (Spearman correlation between score and future return).
Generate: Prompt the LLM with top performers and evolution rules.
Filter: Validate new factors, discard underperformers.
Construct Portfolio: Aggregate factor scores to rank assets and select the top m.
Repeat: Continue this cycle every few steps, ensuring the factor pool adapts over time.

This process doesn’t just discover good factors—it evolves them, capturing regime shifts and style rotations. The system’s ability to shift from momentum-centric signals in bull markets to stability-focused ones in bear phases shows a surprising level of macro-awareness, all without feeding it economic indicators.

Outperforming Humans and Machines

EFS was tested across:

Five Fama-French benchmarks (e.g., FF25, FF100MEOP)
Three real-market datasets: US50, HSI45, CSI300

Highlights:

Metric	GPT-4.1 (EFS) vs 1/N Baseline (US50, m=10)
Cumulative Wealth	39.7x vs 4.6x
Sharpe Ratio	0.154 vs 0.072
Max Drawdown	25% vs 34%

Notably, these gains persisted even under transaction costs and across three LLM runs, underscoring robustness. The system’s adaptability was evident as it consistently rotated into top-performing sectors (e.g., semiconductors in 2023) without manual sector encoding.

Why This Matters

Three big takeaways for finance professionals:

LLMs as strategy designers: Instead of fitting models to data, LLMs can now invent investment logic directly, with code.
Ranking over regression: For sparse selection tasks, ranking-based approaches often beat continuous prediction methods in both robustness and interpretability.
Closed-loop is key: A one-shot factor generation scheme—even with LLMs—pales in comparison to EFS’s iterative refinement and backtest-driven feedback.

Looking Ahead

EFS isn’t perfect—it still struggles with portfolio turnover and may over-concentrate under aggressive weighting. But its architecture offers a glimpse into what language-native quantitative finance could look like: explainable, modular, and continuously learning.

Expect future work to push this further with multimodal inputs (e.g., news sentiment + price), turnover-aware prompts, and offline distillation for faster inference.

Cognaptus: Automate the Present, Incubate the Future.

From Predictive Models to Ranking Machines#

A Closed-Loop Evolutionary Engine#

Outperforming Humans and Machines#

Highlights:#

Why This Matters#

Looking Ahead#