LLM | Cognaptus

Think Twice, Then Speak: Deliberative Searcher and the Future of Reliable LLMs

When a large language model (LLM) answers your question with a high degree of confidence, do you trust it? What if it’s wrong—but still confident? The stakes are high in real-world applications, from legal guidance to enterprise decision support. Yet today’s LLMs remain notoriously unreliable in aligning their confidence with correctness. The paper Deliberative Searcher: Improving LLM Reliability via Reinforcement Learning with Constraints (Yin et al., 2025) offers a bold response: rewire LLMs to be reasoning-primary and information-secondary. Instead of front-loading search and passively absorbing evidence, Deliberative Searcher acts more like a prudent investigator: it thinks, self-assesses, retrieves external information only when needed, and calibrates its confidence step-by-step. Crucially, it learns this behavior through a custom constrained reinforcement learning regime. ...

Latent Brilliance: Turning LLMs into Creativity Engines

What if we stopped asking language models to “be creative”—and instead let them explore creativity the way humans brainstorm: by remixing ideas, nudging boundaries, and iterating through meaningful variations? That’s exactly what Large Language Models as Innovators proposes: a novel framework that leverages the latent embedding space of ideas—not prompts—to drive controlled, domain-agnostic creativity. Rather than relying on handcrafted rules or complex prompting tricks, the authors show how LLMs can generate original and relevant ideas by interpolating between known concepts, evaluating results, and refining outputs over time. ...

Serverless Bulls and Bears: How One Developer Built a Real-Time Stock Analyst with Zero Infrastructure

Most real-time financial systems rely on deep stacks of infrastructure, from custom APIs to cloud VMs and high-frequency data ingestion pipelines. But what if a single developer could deploy a daily-updating, AI-powered stock analysis engine without a single server? That’s exactly what Taniv Ashraf set out to do — and accomplished — in his recent case study on a fully serverless architecture using Google Gemini, GitHub Actions, and static web hosting. The result is an elegantly simple yet conceptually powerful demonstration of how qualitative LLM analysis and automation tools can replace entire categories of financial tooling — if wielded strategically. ...

Tables Turned: Why LLM-Based Table Agents Are the Next Big Leap in Business AI

When most people think of AI today, they picture text generation, image synthesis, or copilots answering emails. But beneath the surface of digital transformation lies an often-overlooked backbone of enterprise work: tables. Spreadsheets, databases, and semi-structured tabular documents are still where critical operations happen — from finance to health records to logistics. A recent survey paper, Toward Real-World Table Agents, pushes us to rethink how AI interacts with tabular data. Instead of treating tables as static inputs, the authors argue that tables are evolving into active data canvases — and LLM-based Table Agents are poised to become their intelligent orchestrators. ...

The Retrieval-Reasoning Tango: Charting the Rise of Agentic RAG

In the AI race to make large language models both factual and reasoned, two camps have emerged: one focused on retrieval-augmented generation (RAG) to fight hallucination, the other on long-chain reasoning to mimic logic. But neither wins alone. This week’s survey by Li et al. (2025), Towards Agentic RAG with Deep Reasoning, delivers the most comprehensive synthesis yet of the field’s convergence point: synergized RAG–Reasoning. It’s no longer a question of whether retrieval helps generation or reasoning helps retrieval—but how tightly the two can co-evolve, often under the coordination of autonomous agents. ...

Plug Me In: Why LLMs with Tools Beat LLMs with Size

The latest research out of Heriot-Watt University doesn’t just challenge the notion that bigger is better — it quietly dismantles it. In their newly released Athena framework, Nripesh Niketan and Hadj Batatia demonstrate how integrating external APIs into LLM pipelines can outperform even the likes of GPT-4o and LLaMA-Large on real tasks like math and science. And they didn’t just beat them — they lapped them. Why GPT-4 Still Fumbles Math Ask GPT-4o to solve a college-level math problem, and it might hallucinate steps or miss basic arithmetic. The reason? LLMs, even at trillion-parameter scale, are not calculators. They’re probabilistic machines trained on patterns, not deterministic reasoners. ...

The Rise of the Self-Evolving Scientist: STELLA and the Future of Biomedical AI

When was the last time a machine truly surprised you—not with a quirky ChatGPT poem or a clever image generation, but with scientific reasoning that evolved on its own? Meet STELLA, an AI agent for biomedical research that doesn’t just solve problems—it gets better at solving them while solving them. The Static Curse of Smart Agents Modern AI agents have shown promise in navigating the labyrinth of biomedical research, where each inquiry might require cross-referencing papers, running custom bioinformatics analyses, or interrogating molecular databases. But the vast majority of these agents suffer from a fatal limitation: they rely on static, pre-installed toolkits and hard-coded logic trees. Like a PhD student who memorized a textbook but never updated it, they can’t adapt to new tasks or new knowledge without human intervention. ...

LLMs Meet Logic: SymbolicThought Turns AI Relationship Guesswork into Graphs

If AI is going to understand people, it first has to understand relationships. But when it comes to parsing character connections from narrative texts — whether news articles, biographies, or novels — even state-of-the-art language models stumble. They hallucinate links, miss cross-sentence cues, and often forget what they’ve just read. Enter SymbolicThought, a hybrid framework that gives LLMs a logic-boosted sidekick: symbolic reasoning. Developed by researchers at King’s College London and CUHK, the system doesn’t just extract character relationships from text; it builds editable graphs, detects logical contradictions, and guides users through verification with a smart, interactive interface. ...

The Meek Shall Compute It

The Meek Shall Compute It For the past five years, discussions about AI progress have centered on a simple formula: more data + more compute = better models. This scaling paradigm has produced marvels like GPT-4 and Gemini—but also entrenched a new aristocracy of compute-rich players. Is this inequality here to stay? According to a provocative new paper from MIT CSAIL, the answer may be: not for long. The authors argue that due to the laws of diminishing returns, the performance gap between state-of-the-art (SOTA) models and smaller, cheaper “meek” models will shrink over time. If true, this reframes the future of AI as one not of centralized supremacy, but of widespread, affordable competence. ...

Echo Chamber in a Prompt: How Survey Bias Creeps into LLMs

Large Language Models (LLMs) are increasingly deployed as synthetic survey respondents in social science and policy research. But a new paper by Rupprecht, Ahnert, and Strohmaier raises a sobering question: are these AI “participants” reliable, or are we just recreating human bias in silicon form? By subjecting nine LLMs—including Gemini, Llama-3 variants, Phi-3.5, and Qwen—to over 167,000 simulated interviews from the World Values Survey, the authors expose a striking vulnerability: even state-of-the-art LLMs consistently fall for classic survey biases—especially recency bias. ...