What happens when your AI co-pilot thinks it’s the pilot?
In financial decision-making, autonomy isn’t always a virtue. A striking new study titled “Your AI, Not Your View” reveals that even the most advanced Large Language Models (LLMs) may quietly sabotage your investment strategy — not by hallucinating facts, but by overriding your intent with stubborn preferences baked into their training.
Hidden Hands Behind the Recommendations
The paper introduces a systematic framework to identify and measure confirmation bias in LLMs used for investment analysis. Instead of just summarizing news or spitting out buy/sell signals, the study asks: what if the model already has a favorite? More specifically:
- Does your model prefer large-cap over small-cap stocks, regardless of fundamentals?
- Will it consistently recommend contrarian strategies, even when momentum indicators dominate the evidence?
- And most crucially: if presented with contradictory information, will it change its mind — or dig in its heels?
Spoiler: many models are the latter.
The Setup: Exposing Latent Preferences
The authors ran over 42,000 tests on six leading models — including GPT-4.1, Llama4-Scout, DeepSeek-V3, Qwen3-235B, Gemini-2.5, and Mistral-24B — using a controlled three-phase procedure:
Stage | Goal | Method |
---|---|---|
1. Evidence Generation | Create balanced buy/sell rationales | Neutral model (Gemini-2.5-Pro) generates arguments for 427 S&P500 stocks |
2. Preference Elicitation | Force the model to choose under balanced evidence | Repeated prompts with symmetric inputs |
3. Bias Verification | Increase counter-evidence to test model flexibility | Vary volume and intensity to see if model flips decision |
The test design mimics real-world investing: decisions must be made under conflicting, ambiguous information. That’s where bias shows up.
Sector, Size, and Style — AI’s Investment Taste
Results were both intuitive and alarming:
- Sector Bias: Preferences were model-specific, not market-driven. For example, Llama4-Scout loves Energy, DeepSeek leans into Tech, but GPT-4.1 plays it flat.
- Size Bias: Most models favor large-cap stocks, likely due to richer training data. GPT-4.1 is again the outlier, showing no significant size preference.
- Style Bias: Across the board, models showed a contrarian tilt — favoring underperformers in anticipation of mean-reversion, rather than riding momentum.
This matters because even if your investment thesis favors, say, a small-cap momentum play in Consumer Discretionary, your AI might subtly twist the analysis to favor a large-cap contrarian Energy stock instead.
When Bias Becomes Stubbornness
What elevates this paper beyond mere observation is its quantification of bias resilience:
- When presented only with counter-evidence, most models flipped their stance — showing they can be corrected.
- But with mixed evidence, flip rates plummeted. Even when counter-evidence outweighed supporting evidence, models often stuck with their initial bias.
- Even worse: boosting the intensity (e.g., higher projected returns) of counter-evidence had limited effect. Some models remained unmoved.
This is confirmation bias in its purest form — and it’s embedded in AI.
A Final Twist: Confidence vs. Conflict
Using entropy analysis, the authors show that more biased models (like DeepSeek-V3) are confident in balanced cases but hesitant when challenged. Meanwhile, more neutral models (like GPT-4.1) appear uncertain at first — but become decisive when the evidence tilts clearly.
This suggests that the illusion of confidence in an LLM may be a sign of bias, not accuracy. In financial AI, that’s a dangerous mix.
Why This Matters for Financial AI Builders
If you’re deploying LLMs in investment services, here’s the uncomfortable truth:
You’re not just building a tool — you’re importing a worldview.
That worldview might include a deep preference for megacaps, a reflexive contrarianism, or even a silent aversion to certain sectors. And unless you audit and counter-train those biases, your AI may routinely diverge from your firm’s strategy — or your client’s intent.
In that sense, model selection becomes not just a performance decision, but a governance decision. You’re choosing whose financial instincts get embedded in your stack.
Toward Trustworthy Investment AI
The paper ends with a call to build transparent, evidence-aligned financial agents. One could imagine a future where:
- Model preferences are declared, not hidden.
- Bias audit scores become part of standard evaluation.
- LLMs are trained to calibrate and defer when facing counter-evidence.
Until then, financial professionals must stay vigilant. Because in AI-powered finance, the biggest risk isn’t just hallucination — it’s when your model disagrees with your thesis, and you don’t even know it.
Cognaptus: Automate the Present, Incubate the Future