Cover image

Paperwork Intelligence: Why AI Still Struggles With Real Enterprise Documents

Paperwork is where enterprise AI demos go to lose their charm. In a product demo, an AI agent usually receives a clean PDF, a friendly question, and a document that has the decency to behave like a document. It summarizes, retrieves, answers, maybe even produces a small spreadsheet. Everyone nods. Someone says “workflow automation.” Someone else says “agentic.” The meeting ends before anyone asks whether the same system can handle 89,000 pages of historical reports, nested tables, revised statistics, scanned pages, ambiguous row headers, and a calculation that must be correct to the last digit. ...

March 12, 2026 · 19 min · Zelina
Cover image

Ultra‑Sparse Embeddings Without Apology

Search gets expensive quietly. At small scale, an embedding is just a vector. At product scale, it becomes rent: storage rent, memory rent, GPU rent, latency rent, and the recurring emotional tax of explaining why a semantic search feature needs yet another infrastructure budget. Dense embeddings made this bargain feel natural. More dimensions, more semantic capacity. More semantic capacity, better retrieval. Better retrieval, more invoices. Elegant, if one enjoys expensive inevitability. ...

February 8, 2026 · 19 min · Zelina
Cover image

Search-R2: When Retrieval Learns to Admit It Was Wrong

Search is supposed to make language models safer. The model does not know something, so it searches. It finds evidence, reasons over that evidence, and gives a better answer. Very civilized. Very responsible. Then the first search query goes slightly wrong. The model retrieves a relevant-looking but misleading paragraph. It builds the next reasoning step around the wrong entity. The next query becomes narrower, but in the wrong direction. The final answer may still sound fluent, because fluency is the one department where language models rarely file sick leave. The actual reasoning chain, however, has already drifted. ...

February 4, 2026 · 16 min · Zelina
Cover image

Seeing Is Misleading: When Climate Images Need Receipts

A picture lies differently from a sentence. A sentence can be checked against a source. A picture can be old, cropped, staged, reused, mislabeled, emotionally loaded, or paired with a claim it never supported. This is why climate disinformation is annoying in the precise technical sense: it often does not need to fabricate a new fact. It can simply attach a real-looking image to a slippery claim and let the audience do the rest. Very efficient. Very human. Very platform-native. ...

January 23, 2026 · 15 min · Zelina
Cover image

Bubble Trouble: Why Top‑K Retrieval Keeps Letting LLMs Down

The problem is not finding documents. It is spending the prompt budget badly. Ask an enterprise RAG system for “scope of work,” and the system may look confident for exactly the wrong reason. The query sounds simple. Somewhere in the document set, there is probably a sheet, paragraph, or clause literally called “Scope of Works.” A flat top-k retriever will happily grab the highest-scoring chunks from that section, stack them into the model context, and call the job done. Very tidy. Very wrong. ...

January 16, 2026 · 18 min · Zelina
Cover image

Making Noise Make Sense: How FANoise Sharpens Multimodal Representations

Search systems fail in boring ways before they fail in spectacular ones. A customer uploads a product photo and receives visually similar items that miss the actual intent. A compliance analyst searches a scanned document and gets pages that look close but answer the wrong question. A visual QA system finds the right region but ranks the wrong evidence first. Nobody in the meeting says, “Ah yes, our embedding space has poor spectral noise allocation.” They say the search feels unreliable. Much more executive-friendly. Much less useful. ...

November 30, 2025 · 13 min · Zelina
Cover image

One Pass to Rule Them All: YOFO and the Rise of Compositional Judging

Search is where nuance goes to die. A customer asks for a long evening dress, preferably not pink. A retrieval model sees “dress,” “evening,” perhaps “pink,” and returns something short, bright, and entirely wrong with the confidence of a clerk who has technically read the sentence but not understood the assignment. The business consequence is familiar: fewer conversions, more irrelevant recommendations, and yet another dashboard where “semantic relevance” looks respectable while customers quietly leave. ...

November 22, 2025 · 17 min · Zelina
Cover image

Memory With a Pulse: Real-Time Feedback Loops for RAG Systems

Ask an enterprise chatbot the wrong question on the wrong day and the problem is rarely that the language model has forgotten how to write English. The problem is that it has been handed the wrong pile of evidence. That is the expensive little defect inside many retrieval-augmented generation systems. The model may be fluent. The corpus may be current. The vector database may be humming along like a well-funded filing cabinet. Yet the answer still disappoints because the system chose the wrong snippets, placed a useful document too low, missed a newly relevant runbook, or treated yesterday’s user intent as if it were carved into basalt. ...

November 10, 2025 · 15 min · Zelina
Cover image

Beyond Answers: Measuring How Deep Research Agents Really Think

A research report is not an answer with extra paragraphs. That sounds obvious until an enterprise team tries to evaluate a deep research agent by asking whether its final conclusion looks plausible, whether it included citations, and whether the prose sounded confident enough to survive a board deck. Congratulations: the machine has produced something that resembles diligence. Whether it actually performed diligence is the inconvenient question. ...

October 9, 2025 · 15 min · Zelina
Cover image

Backtrack to Breakthrough: Why Great AI Agents Revisit

Search is easy. Knowing when to go back is harder. That is the useful irritation inside GSM-Agent, a new benchmark for studying agentic reasoning under controlled conditions.1 The paper takes grade-school maths problems from GSM8K, removes the premises from the prompt, hides those premises in a searchable document database, and asks an LLM agent to recover the facts before solving the problem. The arithmetic is not supposed to be impressive. That is the point. If a model fails here, we cannot calmly blame differential geometry, PhD-level law, or some mysteriously adversarial enterprise workflow. The agent simply did not find and use the facts. ...

October 3, 2025 · 15 min · Zelina