RAG | Cognaptus

Update or Revise? Turns Out It’s the Same Argument in a Better Suit

Memory is where many AI systems quietly lose their dignity. A user corrects an agent. A compliance rule changes. A contract clause is clarified. A retrieval system finds a newer document that contradicts an older one. The system must decide what to do with the new information. Should it update because the world has changed, or revise because its earlier belief was wrong? ...

Don’t Walk to the Car Wash: Why Prompt Architecture Beats More Context

Car wash. That is not usually where enterprise AI strategy goes to become interesting. Yet a small question about whether one should walk or drive to a nearby car wash exposes a very real failure mode in LLM systems: the model optimizes the visible variable and misses the actual task. The question is simple: ...

Flow, Don’t Hallucinate: Turning Agent Workflows into Reusable Enterprise Assets

Workflow reuse sounds like a housekeeping problem. It is not. In many companies, workflow automation has already escaped the tidy diagram on the transformation slide. One team builds an n8n flow to process invoices. Another builds a Dify workflow to triage support tickets. A third writes an internal tool chain for compliance checks. Each workflow contains useful logic: API calls, branching rules, exception handling, data validation, reporting steps, and the small ugly details that make automation survive contact with real operations. ...

Mind the Gap: When Clinical LLMs Learn from Their Own Mistakes

Mistakes are usually treated as waste. In clinical AI, they are treated even more nervously: logged, redacted, escalated, converted into a slide deck, and then politely buried under the next benchmark table. Understandable. Nobody wants a medical agent whose product roadmap reads like “learning through patient-adjacent embarrassment.” But the paper Closing Reasoning Gaps in Clinical Agents with Differential Reasoning Learning makes a useful move: it treats mistakes not as isolated failures, but as a structured raw material for improving future reasoning.1 The core idea is not that a clinical LLM should “reflect” harder, nor that we should throw more guidelines into the prompt until the context window starts whimpering. The idea is more surgical: compare the model’s reasoning with a better reference reasoning trace, locate the precise gap, convert that gap into a reusable instruction, and retrieve that instruction when a similar case appears later. ...

Ultra‑Sparse Embeddings Without Apology

Search gets expensive quietly. At small scale, an embedding is just a vector. At product scale, it becomes rent: storage rent, memory rent, GPU rent, latency rent, and the recurring emotional tax of explaining why a semantic search feature needs yet another infrastructure budget. Dense embeddings made this bargain feel natural. More dimensions, more semantic capacity. More semantic capacity, better retrieval. Better retrieval, more invoices. Elegant, if one enjoys expensive inevitability. ...

Beyond Cosine: When Order Beats Angle in Embedding Similarity

Search has a small ritual. Take two embeddings, compute cosine similarity, rank the results, and move on. The ritual is fast, familiar, and usually good enough. It is also so deeply embedded in AI infrastructure that many teams treat it less like a modeling choice and more like plumbing. That is convenient. It is not always innocent. ...

When RAG Needs Provenance, Not Just Recall: Traceable Answers Across Fragmented Knowledge

RAG has a public-relations problem. It promises grounded answers, then quietly assumes that “grounded” means “retrieved from somewhere nearby.” That assumption is convenient. It is also the kind of convenience that creates compliance incidents, medical confusion, and internal knowledge assistants that cite the wrong document with absolute confidence. A retrieval-augmented system can answer from evidence and still choose the wrong evidence. It can cite something real and still fail provenance. ...

Simulate This: When LLMs Stop Talking and Start Modeling

A simulation model is not a chatbot with a spreadsheet attached. That sounds obvious until a project team starts treating the LLM as if it were the entire modeling stack: the analyst, the programmer, the validator, the documentation clerk, the statistical package, and occasionally the intern blamed when the result changes on Tuesday. The convenient story is that better prompting will tame the system. Add more examples. Add a RAG. Set temperature to zero. Smile at the demo. ...

Search-R2: When Retrieval Learns to Admit It Was Wrong

Search is supposed to make language models safer. The model does not know something, so it searches. It finds evidence, reasons over that evidence, and gives a better answer. Very civilized. Very responsible. Then the first search query goes slightly wrong. The model retrieves a relevant-looking but misleading paragraph. It builds the next reasoning step around the wrong entity. The next query becomes narrower, but in the wrong direction. The final answer may still sound fluent, because fluency is the one department where language models rarely file sick leave. The actual reasoning chain, however, has already drifted. ...

FadeMem: When AI Learns to Forget on Purpose

Memory is easy to sell. Give an AI agent a bigger context window. Add a vector database. Store every user preference, meeting note, support ticket, and half-correct instruction that ever passed through the system. Then call it “persistent memory,” because apparently a drawer full of old receipts is now intelligence. The problem is that agents do not fail only because they forget. They also fail because they remember too much, too flatly, and too obediently. Old facts compete with new ones. Repeated but trivial details crowd out rare but important constraints. Retrieval brings back something semantically similar but temporally wrong. The agent sounds confident because the database found something. Very helpful. Very dangerous. ...