Retrieval-Augmented Generation

MARCH Orders: When AI Holds a CT Case Conference

The useful meeting, unfortunately, exists Meetings are usually where productivity goes to file a complaint. But there is one kind of meeting that high-stakes work still needs: the review session where a first draft is challenged, evidence is checked, and a senior decision-maker signs off. Radiology has long understood this. A resident may draft the report. A fellow may question the interpretation. An attending radiologist resolves the remaining uncertainty. The point is not ceremony. The point is controlled disagreement. ...

The Search That Remembers: Training AI Without Answers

Search looks cheap until you try to train it. A business can usually collect plenty of questions. Employees ask support bots why a policy changed. Analysts ask internal search systems for comparable transactions. Legal teams ask where a contract clause first appears. Researchers ask agents to chase a multi-step trail across documents, web pages, and databases. ...

Entropy Over Relevance: Why Your RAG System Is Asking the Wrong Questions

Evidence is not context. That is the small, expensive misunderstanding behind many enterprise RAG systems. A user asks a question, the system retrieves semantically similar chunks, the model reads them, and the answer arrives with a tone that suggests the matter has been settled. Very reassuring. Sometimes even correct. But in the situations where RAG is supposed to be most useful — compliance reviews, financial analysis, legal memos, medical evidence summaries, internal strategy briefings — the problem is often not that the system has too little relevant material. The problem is that the relevant material disagrees, overlaps, dates badly, or supports several competing interpretations at once. ...

Seeing Is Believing: Why Visual RAG Might Be the Missing Layer in Clinical AI

Guidelines are not novels. That sounds obvious until we remember how most retrieval-augmented generation systems treat them. A clinical guideline becomes text. The text becomes chunks. The chunks become embeddings. The embeddings become “context.” Somewhere in that mechanical conversion, a dosing table, a referral pathway, or a threshold hidden inside a flowchart quietly loses its shape. Then everyone acts surprised when the answer is fluent but clinically thin. Very mysterious. ...

CompactRAG: When Multi-Hop Reasoning Stops Burning Tokens

Ask a normal enterprise RAG system a simple factual question, and it behaves politely enough. Retrieve a few passages. Hand them to the model. Generate an answer. Fine. Ask it a question that requires two or three steps, and the machine starts developing expensive habits. It retrieves, reasons, retrieves again, expands the prompt, reasons again, rewrites a query, retrieves more evidence, and then asks the LLM to stitch the mess together. The architecture looks intellectually serious. The invoice looks even more serious. ...

When Retrieval Learns to Breathe: Teaching LLMs to Go Wide and Deep

Retrieval has a breathing problem. Most enterprise RAG systems inhale once, grab the nearest chunks, and then hope the model can make the answer sound less fragile than the evidence actually is. That works tolerably well when the user asks for something sitting neatly inside a document paragraph. It works less well when the answer lives across entities, relations, aliases, product categories, authors, diseases, suppliers, regulations, or customer records. In other words, it works less well in the part of business where knowledge is not a pile of text but a network. ...

Deep GraphRAG: Teaching Retrieval to Think in Layers

Retrieval has a management problem. Not the motivational-poster kind of management problem. The operational kind. A company asks its AI system a question about a contract, a customer dispute, a policy exception, or a technical incident. The answer is not sitting in one paragraph. It is distributed across definitions, transactions, policies, exceptions, and historical context. A flat vector search grabs a few semantically similar chunks and hopes the model can stitch them together. A global summarizer reads widely, compresses aggressively, and occasionally smooths away the exact fact that mattered. A local graph search follows nearby entities and may become very confident inside the wrong neighborhood. ...

LeanCat-astrophe: Why Category Theory Is Where LLM Provers Go to Struggle

A developer can understand what a software function should do, write something that looks reasonable, and still fail because the surrounding codebase expects a particular interface, naming convention, object hierarchy, or sequence of calls. Giving the developer four independent attempts may eventually fix a misplaced bracket. It does little when the real problem is that they do not know which internal abstraction the system expects. ...

MIRAGE-VC: Teaching LLMs to Think Like VCs (Without Drowning in Graphs)

Deal flow is rarely scarce. Attention is. A venture-capital team may receive hundreds of startup introductions, each surrounded by founder biographies, investor histories, comparable companies, co-investment relationships, sector narratives, and enthusiastic claims about an inevitable Series A. The practical problem is not obtaining more evidence. It is deciding which fragments deserve serious attention before the partnership meeting begins. ...

Replace, Don’t Expand: When RAG Learns to Throw Things Away

The inbox problem hiding inside RAG Inbox. That is the easiest way to understand what goes wrong in many retrieval-augmented generation systems. A query arrives. The system retrieves a few documents. The answer is not obvious. So the system retrieves more. Then more. Then perhaps a web search result. Then a rewritten query. Then another bundle of passages. ...