LLMs | Cognaptus

Homo Silicus Goes to Wall Street

As AI systems step into the boardroom and brokerage app, a new question arises: How do they think about money? In a world increasingly shaped by large language models (LLMs) not just answering questions but making decisions, we need to ask not just whether AI is accurate—but what kind of financial reasoner it is. A recent study by Orhan Erdem and Ragavi Pobbathi Ashok tackles this question head-on by comparing the decision-making profiles of seven LLMs—including GPT-4, DeepSeek R1, and Gemini 2.0—with those of humans across 53 countries. The result? LLMs consistently exhibit a style of reasoning distinct from human respondents—and most similar to Tanzanian participants. Not American, not German. Tanzanian. That finding, while seemingly odd, opens a portal into deeper truths about how these models internalize financial logic. ...

Thoughts, Exposed: Why Chain-of-Thought Monitoring Might Be AI Safety’s Best Fragile Hope

Imagine debugging a black box. Now imagine that black box occasionally narrates its thoughts aloud. That’s the opportunity—and the fragility—presented by Chain-of-Thought (CoT) monitoring, a newly emergent safety paradigm for large language models (LLMs). In their recent landmark paper, Korbak et al. argue that reasoning traces generated by LLMs—especially those trained for explicit multi-step planning—offer a fleeting yet powerful handle on model alignment. But this visibility, they warn, is contingent, brittle, and already under threat. ...

Reasoning at Scale: How DeepSeek Redefines the LLM Playbook

If GPT-4 was the apex of pretraining, DeepSeek might be the blueprint for what comes next. Released in two families—DeepSeek-V3 and DeepSeek-R1—this Chinese open-source model series isn’t just catching up to frontier LLMs. It’s reshaping the paradigm entirely. By sidestepping traditional supervised fine-tuning in favor of reinforcement learning (RL), and coupling it with memory-efficient innovations like Multi-head Latent Attention (MLA) and cost-efficient training techniques like FP8 mixed precision and fine-grained MoE, DeepSeek models demonstrate how strategic architectural bets can outpace brute-force scale. ...

Chunks, Units, Entities: RAG Rewired by CUE-RAG

Retrieval-Augmented Generation (RAG) has become the go-to technique for grounding large language models (LLMs) in external data. But as anyone building real-world RAG pipelines knows, there’s a growing tension between accuracy and cost. Existing graph-based RAG solutions promise richer semantics than vanilla vector stores, but suffer from two persistent issues: incomplete graphs and retrieval misalignment. The paper “CUE-RAG: Towards Accurate and Cost-Efficient Graph-Based RAG” proposes a structural rethinking. By integrating a multi-partite graph, hybrid extraction, and a query-driven iterative retriever, CUE-RAG achieves state-of-the-art accuracy while cutting indexing costs by up to 72.58% and even outperforming other methods without using any LLM tokens at all. ...

Cognitive Gridlock: Is Consciousness a Jamming Phase?

In the world of physics, when particles in a system become so densely packed or cooled that they lock into place, we call this phenomenon jamming. Sand becoming rigid under pressure, traffic freezing on a highway, or even glass transitioning from fluid to solid—all are governed by this principle. What if the same laws applied to intelligence? A provocative new paper, Consciousness as a Jamming Phase by Kaichen Ouyang, suggests just that: large language models (LLMs) exhibit consciousness-like properties not as a software quirk but as a physical phase transition, mirroring the jamming of particles in disordered systems. ...

Inner Critics, Better Agents: The Rise of Introspective AI

When AI agents begin to talk to themselves—really talk to themselves—we might just witness a shift in how machine reasoning is conceived. A new paper, “Introspection of Thought Helps AI Agents”, proposes a reasoning framework (INoT) that takes inspiration not from more advanced outputs or faster APIs, but from an old philosophical skill: inner reflection. Rather than chaining external prompts or simulating collaborative agents outside the model, INoT introduces PromptCode—a code-integrated prompt system that embeds a virtual multi-agent debate directly inside the LLM. The result? A substantial increase in reasoning quality (average +7.95%) and a dramatic reduction in token cost (–58.3%) compared to state-of-the-art baselines. Let’s unpack how this works, and why it could redefine our mental model of what it means for an LLM to “think.” ...

Bias, Baked In: Why Pretraining, Not Fine-Tuning, Shapes LLM Behavior

What makes a large language model (LLM) biased? Is it the instruction tuning data, the randomness of training, or something more deeply embedded? A new paper from Itzhak, Belinkov, and Stanovsky, presented at COLM 2025, delivers a clear verdict: pretraining is the primary source of cognitive biases in LLMs. The implications of this are far-reaching — and perhaps more uncomfortable than many developers would like to admit. The Setup: Two Steps, One Core Question The authors dissected the origins of 32 cognitive biases in LLMs using a controlled two-step causal framework: ...

What LLMs Remember—and Why: Unpacking the Entropy-Memorization Law

The best kind of privacy leak is the one you can measure. A recent paper by Huang et al. introduces a deceptively simple but powerful principle—the Entropy-Memorization Law—that allows us to do just that. It claims that the entropy of a text sequence is strongly correlated with how easily it’s memorized by a large language model (LLM). But don’t mistake this for just another alignment paper. This law has concrete implications for how we audit models, design prompts, and build privacy-aware systems. Here’s why it matters. ...

Humans in the Loop, Not Just the Dataset

When Meta and other tech giants scale back content moderation, the gap isn’t just technical—it’s societal. Civil society organizations (CSOs), not corporations, are increasingly on the frontlines of monitoring online extremism. But they’re often armed with clunky tools, academic prototypes, or opaque black-box models. A new initiative—highlighted in Civil Society in the Loop—challenges this status quo by co-designing a Telegram monitoring tool that embeds human feedback directly into its LLM-assisted classification system. The twist? It invites civil society into the machine learning loop, not just the results screen. ...

The Invisible Hand in the Machine: Rethinking AI Through a Collectivist Lens

The most radical idea in Michael I. Jordan’s latest manifesto isn’t a new model, a benchmark, or even a novel training scheme. It’s a reorientation. He argues that we’ve misdiagnosed the nature of intelligence—and in doing so, we’ve built AI systems that are cognitively brilliant yet socially blind. The cure? Embrace a collectivist, economic lens. This is not techno-utopianism. Jordan—a towering figure in machine learning—offers a pointed critique of both the AGI hype and the narrow symbolic legacy of classical AI. The goal shouldn’t be to build machines that imitate lone geniuses. It should be to construct intelligent collectives—systems that are social, uncertain, decentralized, and deeply intertwined with human incentives. In short: AI needs an economic imagination. ...