LLM | Cognaptus

The Retrieval-Reasoning Tango: Charting the Rise of Agentic RAG

In the AI race to make large language models both factual and reasoned, two camps have emerged: one focused on retrieval-augmented generation (RAG) to fight hallucination, the other on long-chain reasoning to mimic logic. But neither wins alone. This week’s survey by Li et al. (2025), Towards Agentic RAG with Deep Reasoning, delivers the most comprehensive synthesis yet of the field’s convergence point: synergized RAG–Reasoning. It’s no longer a question of whether retrieval helps generation or reasoning helps retrieval—but how tightly the two can co-evolve, often under the coordination of autonomous agents. ...

Plug Me In: Why LLMs with Tools Beat LLMs with Size

The latest research out of Heriot-Watt University doesn’t just challenge the notion that bigger is better — it quietly dismantles it. In their newly released Athena framework, Nripesh Niketan and Hadj Batatia demonstrate how integrating external APIs into LLM pipelines can outperform even the likes of GPT-4o and LLaMA-Large on real tasks like math and science. And they didn’t just beat them — they lapped them. Why GPT-4 Still Fumbles Math Ask GPT-4o to solve a college-level math problem, and it might hallucinate steps or miss basic arithmetic. The reason? LLMs, even at trillion-parameter scale, are not calculators. They’re probabilistic machines trained on patterns, not deterministic reasoners. ...

The Rise of the Self-Evolving Scientist: STELLA and the Future of Biomedical AI

When was the last time a machine truly surprised you—not with a quirky ChatGPT poem or a clever image generation, but with scientific reasoning that evolved on its own? Meet STELLA, an AI agent for biomedical research that doesn’t just solve problems—it gets better at solving them while solving them. The Static Curse of Smart Agents Modern AI agents have shown promise in navigating the labyrinth of biomedical research, where each inquiry might require cross-referencing papers, running custom bioinformatics analyses, or interrogating molecular databases. But the vast majority of these agents suffer from a fatal limitation: they rely on static, pre-installed toolkits and hard-coded logic trees. Like a PhD student who memorized a textbook but never updated it, they can’t adapt to new tasks or new knowledge without human intervention. ...

LLMs Meet Logic: SymbolicThought Turns AI Relationship Guesswork into Graphs

If AI is going to understand people, it first has to understand relationships. But when it comes to parsing character connections from narrative texts — whether news articles, biographies, or novels — even state-of-the-art language models stumble. They hallucinate links, miss cross-sentence cues, and often forget what they’ve just read. Enter SymbolicThought, a hybrid framework that gives LLMs a logic-boosted sidekick: symbolic reasoning. Developed by researchers at King’s College London and CUHK, the system doesn’t just extract character relationships from text; it builds editable graphs, detects logical contradictions, and guides users through verification with a smart, interactive interface. ...

The Meek Shall Compute It

The Meek Shall Compute It For the past five years, discussions about AI progress have centered on a simple formula: more data + more compute = better models. This scaling paradigm has produced marvels like GPT-4 and Gemini—but also entrenched a new aristocracy of compute-rich players. Is this inequality here to stay? According to a provocative new paper from MIT CSAIL, the answer may be: not for long. The authors argue that due to the laws of diminishing returns, the performance gap between state-of-the-art (SOTA) models and smaller, cheaper “meek” models will shrink over time. If true, this reframes the future of AI as one not of centralized supremacy, but of widespread, affordable competence. ...

Echo Chamber in a Prompt: How Survey Bias Creeps into LLMs

Large Language Models (LLMs) are increasingly deployed as synthetic survey respondents in social science and policy research. But a new paper by Rupprecht, Ahnert, and Strohmaier raises a sobering question: are these AI “participants” reliable, or are we just recreating human bias in silicon form? By subjecting nine LLMs—including Gemini, Llama-3 variants, Phi-3.5, and Qwen—to over 167,000 simulated interviews from the World Values Survey, the authors expose a striking vulnerability: even state-of-the-art LLMs consistently fall for classic survey biases—especially recency bias. ...

Beyond the Pareto Frontier: Pricing LLM Mistakes in the Real World

For all the hype about model accuracy, inference cost, and latency, most organizations are still squinting at scatter plots to decide which large language model (LLM) to use. But what if we could cut through the tradeoff fog with a single number that tells you exactly which model is worth deploying—for your use case, under your constraints? That’s the bold proposal in a recent paper by Zellinger and Thomson from Caltech: treat LLM selection as an economic decision. Rather than searching for models on the accuracy-cost “Pareto frontier,” they suggest an approach grounded in price-tagging errors, delays, and abstentions in dollar terms. Think of it as a model selection framework that answers: How much is a mistake worth to you? ...

The Phantom Menace in Your Knowledge Base

Retrieval-Augmented Generation (RAG) may seem like a fortress of AI reliability—until you realize the breach happens at the front door, not in the model. Large Language Models (LLMs) have become the backbone of enterprise AI assistants. Yet as more systems integrate RAG pipelines to improve their factuality and domain alignment, a gaping blindspot has emerged—the document ingestion layer. A new paper titled “The Hidden Threat in Plain Text” by Castagnaro et al. warns that attackers don’t need to jailbreak your model or infiltrate your vector store. Instead, they just need to hand you a poisoned DOCX, PDF, or HTML file. And odds are, your RAG system will ingest it—invisibly. ...

Talk is Flight: How RALLY Bridges Language and Learning in UAV Swarms

When language models take flight, consensus becomes not just possible, but programmable. Modern UAV swarms face the daunting task of coordinating across partial observability, adversarial threats, and shifting missions. Traditional Multi-Agent Reinforcement Learning (MARL) offers adaptability, but falters when role differentiation or semantic reasoning is required. Large Language Models (LLMs), meanwhile, understand tasks and intent—but lack grounded, online learning. RALLY (Role-Adaptive LLM-Driven Yoked Navigation) is the first framework to successfully integrate these two paradigms, enabling real-time, role-aware collaboration in UAV swarms. ...

Brains with Gradients: Why Energy-Based Transformers Might Be the Future of Thinking Machines

Brains with Gradients: Why Energy-Based Transformers Might Be the Future of Thinking Machines AI models are getting better at mimicking human intuition (System 1), but what about deliberate reasoning—slow, careful System 2 Thinking? Until now, most methods required supervision (e.g., reward models, verifiers, or chain-of-thought engineering). A new architecture, Energy-Based Transformers (EBTs), changes that. It offers a radically unsupervised, architecture-level path toward models that “think,” not just react. The implications for robust generalization, dynamic reasoning, and agent-based autonomy are profound. ...