Cover image

Mirror, Mirror on the LLM: Teaching Models to Think About Their Thinking

Evidence is not the same as judgment. Anyone who has watched an AI assistant work through a multi-document question has seen the strange version of this failure. The model finds the relevant fact. It even says something that looks like the right answer. Then, a few paragraphs later, it invents an extra condition, follows that condition with great confidence, and lands somewhere else. ...

February 28, 2026 · 15 min · Zelina
Cover image

Update or Revise? Turns Out It’s the Same Argument in a Better Suit

Memory is where many AI systems quietly lose their dignity. A user corrects an agent. A compliance rule changes. A contract clause is clarified. A retrieval system finds a newer document that contradicts an older one. The system must decide what to do with the new information. Should it update because the world has changed, or revise because its earlier belief was wrong? ...

February 27, 2026 · 17 min · Zelina
Cover image

Don’t Walk to the Car Wash: Why Prompt Architecture Beats More Context

Car wash. That is not usually where enterprise AI strategy goes to become interesting. Yet a small question about whether one should walk or drive to a nearby car wash exposes a very real failure mode in LLM systems: the model optimizes the visible variable and misses the actual task. The question is simple: ...

February 26, 2026 · 14 min · Zelina
Cover image

Flow, Don’t Hallucinate: Turning Agent Workflows into Reusable Enterprise Assets

Workflow reuse sounds like a housekeeping problem. It is not. In many companies, workflow automation has already escaped the tidy diagram on the transformation slide. One team builds an n8n flow to process invoices. Another builds a Dify workflow to triage support tickets. A third writes an internal tool chain for compliance checks. Each workflow contains useful logic: API calls, branching rules, exception handling, data validation, reporting steps, and the small ugly details that make automation survive contact with real operations. ...

February 17, 2026 · 15 min · Zelina
Cover image

Mind the Gap: When Clinical LLMs Learn from Their Own Mistakes

Mistakes are usually treated as waste. In clinical AI, they are treated even more nervously: logged, redacted, escalated, converted into a slide deck, and then politely buried under the next benchmark table. Understandable. Nobody wants a medical agent whose product roadmap reads like “learning through patient-adjacent embarrassment.” But the paper Closing Reasoning Gaps in Clinical Agents with Differential Reasoning Learning makes a useful move: it treats mistakes not as isolated failures, but as a structured raw material for improving future reasoning.1 The core idea is not that a clinical LLM should “reflect” harder, nor that we should throw more guidelines into the prompt until the context window starts whimpering. The idea is more surgical: compare the model’s reasoning with a better reference reasoning trace, locate the precise gap, convert that gap into a reusable instruction, and retrieve that instruction when a similar case appears later. ...

February 11, 2026 · 17 min · Zelina
Cover image

Ultra‑Sparse Embeddings Without Apology

Search gets expensive quietly. At small scale, an embedding is just a vector. At product scale, it becomes rent: storage rent, memory rent, GPU rent, latency rent, and the recurring emotional tax of explaining why a semantic search feature needs yet another infrastructure budget. Dense embeddings made this bargain feel natural. More dimensions, more semantic capacity. More semantic capacity, better retrieval. Better retrieval, more invoices. Elegant, if one enjoys expensive inevitability. ...

February 8, 2026 · 19 min · Zelina
Cover image

Beyond Cosine: When Order Beats Angle in Embedding Similarity

Search has a small ritual. Take two embeddings, compute cosine similarity, rank the results, and move on. The ritual is fast, familiar, and usually good enough. It is also so deeply embedded in AI infrastructure that many teams treat it less like a modeling choice and more like plumbing. That is convenient. It is not always innocent. ...

February 7, 2026 · 14 min · Zelina
Cover image

When RAG Needs Provenance, Not Just Recall: Traceable Answers Across Fragmented Knowledge

RAG has a public-relations problem. It promises grounded answers, then quietly assumes that “grounded” means “retrieved from somewhere nearby.” That assumption is convenient. It is also the kind of convenience that creates compliance incidents, medical confusion, and internal knowledge assistants that cite the wrong document with absolute confidence. A retrieval-augmented system can answer from evidence and still choose the wrong evidence. It can cite something real and still fail provenance. ...

February 7, 2026 · 11 min · Zelina
Cover image

Simulate This: When LLMs Stop Talking and Start Modeling

A simulation model is not a chatbot with a spreadsheet attached. That sounds obvious until a project team starts treating the LLM as if it were the entire modeling stack: the analyst, the programmer, the validator, the documentation clerk, the statistical package, and occasionally the intern blamed when the result changes on Tuesday. The convenient story is that better prompting will tame the system. Add more examples. Add a RAG. Set temperature to zero. Smile at the demo. ...

February 6, 2026 · 18 min · Zelina
Cover image

Search-R2: When Retrieval Learns to Admit It Was Wrong

Search is supposed to make language models safer. The model does not know something, so it searches. It finds evidence, reasons over that evidence, and gives a better answer. Very civilized. Very responsible. Then the first search query goes slightly wrong. The model retrieves a relevant-looking but misleading paragraph. It builds the next reasoning step around the wrong entity. The next query becomes narrower, but in the wrong direction. The final answer may still sound fluent, because fluency is the one department where language models rarely file sick leave. The actual reasoning chain, however, has already drifted. ...

February 4, 2026 · 16 min · Zelina