Cover image

Reasonable Doubt: Why LLM Reasoning Needs Process Control

Why this matters now The business case for LLMs has quietly moved from chatbot answers to agentic work: legal review, compliance checking, market research, document synthesis, internal analytics, coding support, and decision preparation. That shift changes the risk profile. A wrong chatbot answer is annoying. A wrong agent that looks coherent, cites documents, calls tools, updates files, and confidently stops too early is a workflow liability wearing a productivity costume. ...

May 31, 2026 · 12 min · Zelina
Cover image

Don’t Average the Needle: Spectral Retrieval and the RAG Evidence Problem

Enterprise search has a very old habit wearing a very modern jacket: it averages. A policy document becomes one vector. A runbook becomes one vector. A postmortem full of operational detail becomes one vector. Then a RAG system asks that one vector whether the document is relevant. This is convenient, fast, and usually defensible — until the relevant answer is a narrow paragraph hiding inside a large document. At that point, the retrieval system is no longer searching for evidence. It is asking a crowd to speak for the witness. ...

May 30, 2026 · 16 min · Zelina
Cover image

Jailbreak Risk Needs a Stopwatch, Not Just a Scorecard

Jailbreak Risk Needs a Stopwatch, Not Just a Scorecard For many organizations, LLM safety is still treated like a checkpoint: run a benchmark, report an attack success rate, add a few guardrails, and move on. The resulting dashboard looks reassuringly official. It may even have decimals. Unfortunately, adversarial users do not attack dashboards. They attack systems. ...

May 30, 2026 · 17 min · Zelina
Cover image

Query the Receipt, Not the Vibe: DualGraph and the RAG Catalog Problem

A product catalog is not a paragraph with a search box Catalogs look deceptively friendly to RAG systems. A product page has descriptions, feature bullets, specification tables, prices, variants, categories, and marketing copy. Feed those pages into a vector database, ask an LLM a question, and the system should answer. This is the comforting story. It is also where many enterprise RAG demos begin their quiet decline into customer-support theater. ...

May 30, 2026 · 17 min · Zelina
Cover image

RAG’s Receipt Problem: When Correct Answers Don’t Prove Retrieval

RAG’s Receipt Problem: When Correct Answers Don’t Prove Retrieval Retrieval-augmented generation has become the respectable outfit enterprise AI wears when it wants to look grounded. Add a document store, retrieve a few passages, attach citations, and the answer suddenly appears more disciplined than a free-floating chatbot. That appearance is useful. It is not proof. ...

May 30, 2026 · 16 min · Zelina
Cover image

Experience Is Not Memory: Why Learning Agents Need a Better Feedback Loop

A support ticket goes wrong. A workflow agent chooses the wrong tool. A finance assistant misses a procedural step. The usual response is familiar: add the failure to memory, rewrite a prompt, perhaps ask the agent to “reflect” before trying again. This is useful, in the same way that putting a sticky note on a broken machine is useful. It may prevent the same mistake next time. It does not prove the machine has learned how to improve. ...

May 29, 2026 · 18 min · Zelina
Cover image

Context Is the New Attack Surface

A benchmark score is easy to quote. It is harder to know what broke. In Jailbreak Mimicry: Automated Discovery of Narrative-Based Jailbreaks for Large Language Models, Pavlos Ntais reports an 81.0% attack success rate against GPT-OSS-20B on a held-out 200-item test set.1 That number is attention-grabbing. It is also not the main lesson. ...

May 16, 2026 · 13 min · Zelina
Cover image

LoRA and Order: The Strange Case for One Well-Placed Adapter

Opening — Why this matters now Enterprise AI is entering its less glamorous, more useful phase: not “Can we connect an LLM to everything?” but “Can we adapt it without making the GPU bill look like a small infrastructure project?” Fine-tuning still matters. Retrieval helps with knowledge access, prompt engineering helps with behavior shaping, and agent frameworks help with workflow orchestration. But many businesses eventually hit the same wall: the base model is close, yet not close enough. It needs domain style, task format, compliance habits, tool-use discipline, or workflow-specific judgment. That usually means some form of supervised fine-tuning. ...

May 9, 2026 · 15 min · Zelina
Cover image

Provenance, Not Providence: Why AI Answers Need Receipts

Opening — Why this matters now The current AI market has become very good at producing fluent answers and very bad at explaining where those answers came from. This is not a minor inconvenience. It is the difference between an assistant that can be trusted in an operational workflow and an assistant that merely performs confidence with attractive typography. ...

May 9, 2026 · 14 min · Zelina
Cover image

Rank and File: BoostLoRA’s Case for Smarter Fine-Tuning

Opening — Why this matters now Enterprise AI is entering its less glamorous phase: not the demo, not the keynote, not the charming chatbot that answers three curated questions correctly, but the operational grind of making models behave reliably inside messy workflows. That grind usually runs into a familiar triangle. Full fine-tuning is powerful but expensive, operationally heavy, and often risky when the training set is narrow. Parameter-efficient fine-tuning, especially LoRA-style adaptation, is cheaper and easier to deploy, but the smallest adapters can hit a ceiling. Meanwhile, the business user does not care whether the adapter was elegant. They care whether the model stops making the same costly mistakes in invoicing, compliance review, customer support, code generation, or scientific triage. ...

May 4, 2026 · 13 min · Zelina