Cognaptus Insights

When Your AI Disagrees with Your Portfolio

What happens when your AI co-pilot thinks it’s the pilot? In financial decision-making, autonomy isn’t always a virtue. A striking new study titled “Your AI, Not Your View” reveals that even the most advanced Large Language Models (LLMs) may quietly sabotage your investment strategy — not by hallucinating facts, but by overriding your intent with stubborn preferences baked into their training. Hidden Hands Behind the Recommendations The paper introduces a systematic framework to identify and measure confirmation bias in LLMs used for investment analysis. Instead of just summarizing news or spitting out buy/sell signals, the study asks: what if the model already has a favorite? More specifically: ...

Graft and Go: How Knowledge Grafting Shrinks AI Without Shrinking Its Brain

If you’ve ever tried to run a powerful AI model on a modest device—say, a drone, a farm robot, or even a Raspberry Pi—you’ve likely hit the wall of hardware limitations. Today’s most accurate models are big, bloated, and brittle when it comes to efficiency. Enter knowledge grafting, a refreshingly biological metaphor for a novel compression technique that doesn’t just trim the fat—it transfers the muscle. Rethinking Compression: Not What to Cut, But What to Keep Traditional model optimization methods—quantization, pruning, and distillation—all try to make the best of a difficult trade-off: shrinking the model while limiting the damage to performance. These methods often fall short, especially when you push compression past 5–6x. ...

Mind the Earnings Gap: Why LLMs Still Flunk Financial Decision-Making

In the race to make language models financial analysts, a new benchmark is calling bluff on the hype. FinanceBench, introduced by a team of researchers from Amazon and academia, aims to test LLMs not just on text summarization or sentiment analysis, but on their ability to think like Wall Street professionals. The results? Let’s just say GPT-4 may ace the chatroom, but it still struggles in the boardroom. The Benchmark We Actually Needed FinanceBench isn’t your typical leaderboard filler. Unlike prior datasets, which mostly rely on news headlines or synthetic financial prompts, this one uses real earnings call transcripts from over 130 public companies. It frames the task like a genuine investment analyst workflow: ...

Rollout Renaissance: How Pareto-NRPA Revives Monte Carlo for Multi-Objective Optimization

Monte Carlo search algorithms rarely make the shortlist in multi-objective optimization (MOO). Traditionally, the field has belonged to evolutionary algorithms like NSGA-II and SMS-EMOA. But a paper from Paris Dauphine-PSL and Thales upends that hierarchy with an audacious twist: what if we generalized NRPA — a niche but powerful single-objective method — to handle multiple objectives, constraints, and diversity, all in one elegant framework? ...

The Sims Get Smart? Why LLM-Driven Social Simulations Need a Reality Check

Social simulations are entering their uncanny valley. Fueled by generative agents powered by Large Language Models (LLMs), recent frameworks like Smallville, AgentSociety, and SocioVerse simulate thousands of lifelike agents forming friendships, spreading rumors, and planning parties. But do these simulations reflect real social processes — or merely replay the statistical shadows of the internet? When Simulacra Speak Fluently LLMs have demonstrated striking abilities to mimic human behaviors. GPT-4 has passed Theory-of-Mind (ToM) tests at levels comparable to 6–7 year-olds. In narrative contexts, it can detect sarcasm, understand indirect requests, and generate empathetic replies. But all of this arises not from embodied cognition or real-world goals — it’s just next-token prediction trained on massive corpora. ...

$Cover image$

Tool Up or Tap Out: How Multi-TAG Elevates Math Reasoning with Smarter LLM Workflows

Most tool-augmented LLMs approach math reasoning like they’re wielding a hammer—good for hitting one nail at a time, but ill-equipped when the problem requires a wrench, a compass, and a soldering iron all at once. Enter Multi-TAG, a clever, finetuning-free framework that aggregates the strengths of multiple tools per reasoning step. Think of it as an LLM with a toolbox, not just a single tool. And it doesn’t just work—it wins, posting 6.0% to 7.5% accuracy gains across MATH500, AIME, AMC, and OlympiadBench against top baselines, using both open and closed LLMs. ...

All Eggs, One Basket: When Diversification Backfires in Risk Modeling

“Don’t put all your eggs in one basket” has long been gospel in finance and risk management. But what if sometimes, the basket is the safer place? In a surprising twist on conventional wisdom, Léonard Vincent’s latest paper presents the one-basket theorem: a theoretical framework that proves diversification can increase risk under certain extreme but relevant conditions. Specifically, when dealing with heavy-tailed risks that have infinite mean — such as those found in insurance, operational risk, and even crypto markets — putting all your eggs in one basket may be the rational choice. ...

Boxed In, Cashed Out: Deep Gradient Flows for Fast American Option Pricing

Pricing American options has long been the Achilles’ heel of quantitative finance, particularly in high dimensions. Unlike European options, American-style derivatives introduce a free-boundary problem due to their early exercise feature, making analytical solutions elusive and most numerical methods inefficient beyond two or three assets. But a recent paper by Jasper Rou introduces a promising technique — the Time Deep Gradient Flow (TDGF) — that sidesteps several of these barriers with a fresh take on deep learning design, optimization, and sampling. ...

Divide, Route, and Conquer: DriftMoE's Smart Take on Concept Drift

Concept drift is the curse of the real world. Models trained on yesterday’s data go stale in hours, sometimes minutes. Traditional remedies like Adaptive Random Forests (ARF) respond reactively, detecting change and resetting trees. But what if the system could instead continuously learn where to look, dynamically routing each new sample to the right expert — no drift detector required? That’s exactly the ambition behind DriftMoE, a Mixture-of-Experts framework purpose-built for online learning in non-stationary environments. Co-developed by researchers at Ireland’s CeADAR, this architecture marries lightweight neural routing with classic Hoeffding trees, achieving expert specialization as a byproduct of learning — not as a bolted-on correction. ...

Factor Factory: How LLMs Are Reinventing Sparse Portfolio Optimization

In quantitative finance, sparse portfolio optimization is a famously unforgiving problem. Selecting the top m assets from a universe of n under budget and risk constraints is NP-hard, highly sensitive to hyperparameters, and often brittle in volatile markets. Traditional solutions—from greedy algorithms to convex relaxations—either crumble under market shifts or produce opaque, overfitted outputs. But what if we reframed the problem entirely? Enter EFS (Evolutionary Factor Search), a radical new framework that turns sparse portfolio construction into an LLM-guided ranking game. Instead of laboriously tuning machine learning models or relying on rigid heuristics, EFS lets large language models generate, evolve, and select alpha factors—and it does so in a way that is not just automated, but interpretable, adaptive, and surprisingly effective. ...