RAG | Cognaptus

Memory With Intent: Why LLMs Need a Cognitive Workspace, Not Just a Bigger Window

TL;DR for operators Most enterprise LLM failures do not come from the model “not knowing enough”. They come from the system forgetting what it was doing five minutes ago, rediscovering the same facts, and treating every user turn as a fresh episode in a soap opera nobody asked to watch. The paper behind this article proposes Cognitive Workspace: an active memory architecture for LLMs that deliberately curates, reuses, consolidates, and forgets information rather than merely retrieving chunks or stretching the context window.1 Its core claim is simple but consequential: useful long-context behaviour is not the same as having a long context window. It is the ability to maintain a working state across a task. ...

Atom by Atom, Better Research: How Fine-Grained Rewards Make Agentic Search Smarter

TL;DR for operators Research agents fail in a very familiar way: they do several useful things, then make one bad final move, and the training signal treats the whole journey as garbage. Delightful. Efficient. Totally not a credit-assignment problem wearing a lab coat. Atom-Searcher attacks that problem by splitting an agent’s reasoning trace into Atomic Thoughts: small, functional reasoning units such as planning, verification, hypothesis testing, observation, action selection, or risk analysis. A Reasoning Reward Model then scores those units, producing an Atomic Thought Reward that is blended with the final-answer reward during reinforcement learning.1 ...

Keys to the Kingdom: How LLMs Can Audit Crypto Logic Before It Breaks

TL;DR for operators CryptoScope is not “ChatGPT, please audit my cryptography”. That would be a splendid way to generate confident nonsense with Greek letters. The paper’s useful idea is more disciplined: make the model behave less like a wandering code reviewer and more like a junior cryptographic analyst with a library card, a checklist, and a supervisor. CryptoScope does this by combining three components: a curated cryptographic knowledge base of more than 12,000 entries, a pre-detection step that summarises code and checks algorithm compliance, and a retrieval-augmented final analysis that grounds the model’s reasoning in known failure patterns and implementation guidance.1 ...

RAGulating Compliance: When Triplets Trump Chunks

TL;DR for operators Compliance teams do not mainly need a chatbot that sounds more confident. They already have enough people sounding confident in meetings. They need answers that can be traced back to the rule text, checked against related provisions, and updated when the regulatory corpus changes. The paper behind this article proposes a multi-agent system that turns regulatory documents into subject–predicate–object triplets, embeds those triplets alongside their source sections, retrieves triplets for question answering, and shows users the relevant subgraph behind the answer.1 That matters because regulatory work is not just “find me a paragraph.” It is “show me the applicable rule, the linked requirement, the exception, the deadline, and the neighbouring clause that will embarrass us later.” ...

Confounder Hunters: How LLM Agents are Rewriting the Rules of Causal Inference

TL;DR for operators Clinical analytics teams already know the unpleasant truth: observational data is cheap, rich, and biased in ways that do not politely announce themselves. The paper behind this article proposes a way to make that bias-hunting process less artisanal. Instead of asking experts to manually inspect every causal-tree rule, the framework lets causal trees segment patients, asks medical LLM agents to suggest plausible confounders using decomposed prompting plus retrieval, sends those suggestions through expert validation, then recursively focuses on samples whose treatment-effect estimates still have wide confidence intervals.1 ...

Breaking the Question Apart: How Compositional Retrieval Reshapes RAG Performance

TL;DR for operators A standard RAG system often retrieves the most individually relevant chunks. That is useful until the question needs several different pieces of evidence that must work together. Then the system may return five near-duplicates of the most obvious fact and miss the less obvious fact that actually completes the answer. Excellent. We have reinvented the meeting where everyone brings the same slide. ...

Search When It Hurts: How UR² Teaches Models to Retrieve Only When Needed

TL;DR for operators UR² is a useful paper because it attacks the part of RAG that most demos politely ignore: search can make a model worse when it is used badly.1 The framework trains smaller language models to coordinate retrieval and reasoning, rather than bolting a search box onto a chatbot and hoping the context window will behave itself. Hope, regrettably, is not a retrieval strategy. ...

From Stage to Script: How AMADEUS Keeps AI Characters in Character

TL;DR for operators Characters are easy when they stay on script. They become expensive when users ask the wrong question, which is, naturally, what users do. The AMADEUS paper addresses a specific failure mode in retrieval-augmented role-playing agents: ordinary RAG can retrieve facts, but persona consistency often depends on inferred traits, values, habits, and narrative context rather than direct answers. A user asks, “Are you confident everything will work out?” The persona document may not contain that sentence. Naive RAG may grab a superficially similar chunk and improvise badly. AMADEUS instead tries to retrieve evidence from which a character’s attributes can be inferred, then feeds those attributes into generation.1 ...

Graphs, Gains, and Guile: How FinKario Outruns Financial LLMs

TL;DR for operators FinKario is useful because it attacks a dull but expensive problem: financial research is rich, long, inconsistent, and usually trapped inside documents that models can quote more easily than they can use. The paper’s answer is not “ask a better LLM.” It is “turn research reports into a dynamic financial knowledge graph, then retrieve graph context before asking the LLM to reason.” Small difference. Large operational consequences. ...

Seeing is Retraining: How VizGenie Turns Visualization into a Self-Improving AI Loop

TL;DR for operators VizGenie is not another “type a prompt, get a chart” system. It is a research prototype for scientific visualization where the hard problem is not drawing a bar chart, but helping users explore complex volumetric datasets without manually tuning every slice, isovalue, opacity map, colour map, and feature query like it is a sacred ritual. ...