Cover image

Don’t Average the Needle: Spectral Retrieval and the RAG Evidence Problem

Enterprise search has a very old habit wearing a very modern jacket: it averages. A policy document becomes one vector. A runbook becomes one vector. A postmortem full of operational detail becomes one vector. Then a RAG system asks that one vector whether the document is relevant. This is convenient, fast, and usually defensible — until the relevant answer is a narrow paragraph hiding inside a large document. At that point, the retrieval system is no longer searching for evidence. It is asking a crowd to speak for the witness. ...

May 30, 2026 · 16 min · Zelina
Cover image

Query the Receipt, Not the Vibe: DualGraph and the RAG Catalog Problem

A product catalog is not a paragraph with a search box Catalogs look deceptively friendly to RAG systems. A product page has descriptions, feature bullets, specification tables, prices, variants, categories, and marketing copy. Feed those pages into a vector database, ask an LLM a question, and the system should answer. This is the comforting story. It is also where many enterprise RAG demos begin their quiet decline into customer-support theater. ...

May 30, 2026 · 17 min · Zelina
Cover image

RAG’s Receipt Problem: When Correct Answers Don’t Prove Retrieval

RAG’s Receipt Problem: When Correct Answers Don’t Prove Retrieval Retrieval-augmented generation has become the respectable outfit enterprise AI wears when it wants to look grounded. Add a document store, retrieve a few passages, attach citations, and the answer suddenly appears more disciplined than a free-floating chatbot. That appearance is useful. It is not proof. ...

May 30, 2026 · 16 min · Zelina
Cover image

Read the Receipt: Why RAG Should Highlight Before It Answers

Search looks easy until someone asks where the answer actually came from. A researcher types a rough query into a literature assistant. The system retrieves several papers, writes a fluent answer, and appends citations. Everyone relaxes a little. The citation tag has done its small administrative magic. The answer now looks grounded. ...

May 30, 2026 · 15 min · Zelina
Cover image

The KV Cache Is Not a Detail: Why LLM Compression Needs a Control Plane

Bandwidth is one of those infrastructure costs that looks boring until it becomes the product bottleneck. A retrieval-augmented assistant gets a long document. An agentic workflow accumulates tool traces. A support chatbot reuses a large system prompt and a customer-history prefix. The model may be fast enough, the GPUs may be expensive enough, and yet the user still waits. Not because the model is thinking harder. Because the system is moving state. ...

May 27, 2026 · 15 min · Zelina
Cover image

Receipts, Please: RAG’s New Evidence Stack

Opening — Why this matters now The original business pitch for retrieval-augmented generation was wonderfully simple: connect the model to your documents, ask questions, get grounded answers. No need to retrain the model. No need to wait for the next foundation-model release. Just give the chatbot some files and let productivity bloom. ...

May 7, 2026 · 17 min · Zelina
Cover image

Graph Expectations: Why Context Compression Needs Structure, Not Just Similarity

Opening — Why this matters now The AI industry has developed a charmingly expensive habit: when models struggle with long documents, we buy them larger windows and pretend the problem has been solved. It has not. Long-context LLMs are useful, but longer context is not the same as better context. A model can accept a very large input and still miss the crucial paragraph buried in the middle, over-attend to duplicated evidence, or lose the argumentative spine of a document. The result is familiar to anyone building AI tools for legal review, finance research, policy analysis, procurement, consulting, compliance, or enterprise knowledge work: the model has “read” everything, yet somehow understands the wrong thing. Very modern. Very expensive. ...

May 1, 2026 · 12 min · Zelina
Cover image

When AI Can Solve But Can't Search: The MathNet Equation

Search. That is the unglamorous part of AI work. The demo asks a model to solve a clean problem. The enterprise system asks a model to find the right prior case, retrieve the relevant precedent, avoid the misleading near-match, and then adapt the answer without making a confident mess of it. MathNet is interesting because it puts that distinction under pressure. The paper introduces a large multilingual, multimodal Olympiad mathematics benchmark, but the more useful business lesson is not merely that frontier models can solve hard math. We already have enough leaderboards wearing medals. The sharper finding is that models and embedding systems can still fail at recognizing when two problems are mathematically the same, or when one problem is structurally useful for another.1 ...

April 23, 2026 · 13 min · Zelina
Cover image

WorldDB Memory Wars — Why Agent Memory Needs Structure, Not More Tokens

Memory is cheap until it has to remember correctly. A chatbot can remember a paragraph for a few minutes. An enterprise agent is asked to remember a customer’s old address, current address, account owner, exception approval, product issue, refund promise, and the reason the promise changed last month. Then it must answer without mixing the past with the present. This is where “just add more context” begins to look less like strategy and more like buying a bigger drawer for unsorted receipts. ...

April 23, 2026 · 16 min · Zelina
Cover image

The Memory Isn’t Broken — It’s Flat: Why LLMs Need to ‘Draw’ to Remember

Memory is usually sold as a storage problem. Give the agent a vector database. Add a recall layer. Save summaries. Search harder. Expand the context window until the budget department starts making eye contact. Then ask the agent a simple question: what changed after the earlier conversation? That is where the polite demo often turns into a fog machine. ...

April 15, 2026 · 15 min · Zelina