RAG | Cognaptus

The Retriever Found Similar Things. The Evidence Was Elsewhere.

TL;DR for operators The current enterprise RAG conversation still has a charmingly stubborn misconception: if the model hallucinates, buy better embeddings, increase the context window, add an agent, and hope the PowerPoint becomes true. The two papers here point in a less theatrical direction. One paper, Non-negative Elastic Net Decoding for Information Retrieval, argues that dense retrieval has a structural weakness: it scores each candidate independently, so it can retrieve several similar items instead of the complementary set actually needed to answer the query.1 The other, Agentic Hybrid RAG for Evidence-Grounded Muon Collider Analysis, shows what happens when retrieval is treated as a full evidence workflow: sparse and dense retrieval are fused, queries are decomposed under constraints, evidence is deduplicated and budgeted, and answers are judged for coverage, hallucination, and abstention.2 ...

The Grid Agent Saw the Pole. Then the Workflow Fell Over.

TL;DR for operators Power inspection is not a vision problem with some administrative paperwork attached. It is a chain. An image must become an equipment label, then a defect description, then a severity judgment, then a maintenance decision, then a correctly executed workflow. Break one link early enough and the rest of the chain becomes very confident clerical fiction. ...

Graph Work, Not Graph Worship: RAGA Turns RAG Into an Auditable Knowledge Operation

TL;DR for operators RAGA is not another “add a graph and accuracy goes up” paper. That would be too convenient, and therefore suspicious. The useful idea is more operational: treat retrieval-augmented generation as a knowledge management process, not a pile of embeddings with a polite chatbot on top. The paper proposes RAGA, short for Reading-And-Graph-building-Agent, an autonomous system that reads documents, searches existing graph knowledge, verifies whether new entities or relations should be added, and then constructs or updates a knowledge graph with source-linked provenance.1 Its core loop is Read–Search–Verify–Construct, implemented as a ReAct-style tool-calling agent rather than a one-shot extraction pipeline. ...

Fine-Tuned, Fine Print: Why Post-Training Teaches Models What to Trust

Enterprise AI has entered its “sure, but can it use the evidence?” phase. That is progress, technically. It is also where many deployment stories begin to get expensive. The first generation of business LLM adoption was satisfied if a model could produce a fluent answer. The next generation asks something more demanding: can the model use retrieved documents, compliance policies, tool outputs, customer records, analyst notes, and human feedback in the right way? ...

Roll the Tape, Call the Tools: ReTool-Video and the Evidence-Routing Problem

Video is where AI demos go to become expensive. A model can describe a short clip. It can answer a question about a few sampled frames. It can even sound confident while doing so, which is apparently a product feature now. But business video work is rarely “what is happening in this five-second clip?” It is usually messier: find the exact moment in a two-hour training recording, count repeated actions without double-counting adjacent clips, verify whether an event appears in audio, subtitles, and frames, or decide whether a safety incident is real rather than just visually similar to one. ...

Search, Critique, Repeat: Critic-R Turns RAG Complaints into Retriever Training

Search failure is boring until it becomes expensive. A research agent asks for evidence. The retriever returns documents. The reasoning model reads them, continues writing, and eventually produces a confident answer. Somewhere in the middle, the evidence was slightly wrong: not irrelevant enough to trigger an obvious failure, not useful enough to support the next reasoning step. The agent proceeds anyway, because that is what agents do when we dress up uncertainty as workflow automation. ...

Curved Space, Straighter Retrieval: Why Graph RAG Needs Geometry

Curved Space, Straighter Retrieval: Why Graph RAG Needs Geometry Retrieval looks simple until the wrong thing keeps showing up. A company builds a graph model over products, papers, suppliers, users, or transactions. The model performs reasonably well inside familiar territory. Then the data shifts. New products appear. A new research domain enters the citation graph. A social platform changes user behavior. The model’s internal knowledge, frozen inside parameters, starts behaving like yesterday’s org chart: technically structured, operationally stale. ...

The Gate Before the Graph: Why Technical RAG Needs Evidence Control

Search is easy until it becomes responsible. A product engineer asks, “What methods exist for real-time tire friction estimation?” A normal search tool returns papers. A normal RAG system retrieves chunks. A confident LLM then writes a neat answer, preferably with enough bullet points to look managerial. The problem is not that this answer is always wrong. That would be mercifully simple. The problem is that it may be locally plausible but evidentially thin: two relevant chunks, one outdated method, no coverage of adjacent terminology, and a citation that looks reassuring mostly because it exists. ...

Memory Lane Has Potholes: MemFail and the Business of Testing Agent Recall

Memory is where enterprise AI demos go to become operationally embarrassing. In the demo, the assistant remembers that a client prefers concise weekly updates, that a trader avoids high-leverage positions after volatility spikes, or that a procurement manager only approves a supplier when compliance documents are current. In production, the same assistant may remember the attractive half of the fact and quietly lose the condition. It recalls “approves supplier” but forgets “only when compliance documents are current.” Congratulations: the agent has not forgotten. It has remembered dangerously. ...

Uncertain Terms: Hallucination Scores Are Triage Signals, Not Lie Detectors

Uncertain Terms: Hallucination Scores Are Triage Signals, Not Lie Detectors A support ticket lands on the AI team’s desk: the enterprise chatbot answered confidently, cited the wrong policy, and somehow made the compliance team nostalgic for search boxes. The obvious next idea is to add an uncertainty score. When the model is unsure, route the answer to a verifier. When the score is high, reject the output. When the score is low, let it pass. Elegant. Cheap. Measurable. Also, as usual, a little too clean. ...