Enterprise AI

Tunnel Vision: Why Vision-Language Models Still Miss the Bigger Picture

TL;DR for operators A vision-language model can describe an image, answer a chart question, and still fail at the kind of seeing that a bored intern would perform before lunch. That is the operational lesson from Shmuel Berman and Jia Deng’s paper, VLMs have Tunnel Vision: Evaluating Nonlocal Visual Reasoning in Leading VLMs.1 The paper tests whether leading VLMs can do three basic things: compare two visual objects across an image, follow a sequence of visual clues, and trace a continuous line to its endpoint. Humans find these tasks trivial. Current VLMs do not. ...

Beyond Search: RAG’s Awakening to Enterprise Spreadsheets

TL;DR for operators Most enterprise RAG failures do not begin at the chatbot. They begin earlier, when the retrieval system slices policy manuals into arbitrary chunks, flattens tables into textual porridge, ignores metadata, retrieves semantically similar but operationally wrong passages, and then asks an LLM to look confident. Naturally, the LLM obliges. It has excellent manners. ...

Reasoning at Scale: How DeepSeek Redefines the LLM Playbook

TL;DR for operators DeepSeek-R1 is not a story about one model suddenly becoming clever because someone found the secret lever labelled “reason harder”. It is a systems story: take a strong base model, reward it on problems where correctness can be checked, let longer reasoning traces emerge, repair the ugly parts with cold-start data and alignment, then distil the resulting behaviour into smaller models where deployment economics actually matter.1 ...

Tables Turned: Why LLM-Based Table Agents Are the Next Big Leap in Business AI

TL;DR for operators Most business data does not live in pristine chatbot-friendly prose. It lives in spreadsheets, ledgers, CSV exports, relational databases, dashboards, compliance reports, and those heroic Excel files with merged cells, colour-coded warnings, unexplained abbreviations, and one column called misc. The paper behind this article, Toward Real-World Table Agents, argues that LLM-based table agents should not be judged as smarter versions of Text-to-SQL alone.1 Real-world table work requires an end-to-end workflow: reading table structure, cleaning noisy semantics, retrieving only the relevant parts, executing traceable reasoning steps, and adapting to domains such as finance, healthcare, public administration, and industrial operations. ...

The Retrieval-Reasoning Tango: Charting the Rise of Agentic RAG

TL;DR for operators Static RAG is still useful. It is also no longer the whole game. The paper behind this article argues that retrieval and reasoning are converging into a more tightly coupled architecture: reasoning can improve retrieval, retrieval can improve reasoning, and agentic systems can interleave both over multiple steps.1 That sounds like a neat academic symmetry until you put it inside an enterprise workflow, where every extra retrieval call means latency, cost, permissions, ranking risk, and one more place for the machine to confidently ingest rubbish. ...

Chunks, Units, Entities: RAG Rewired by CUE-RAG

TL;DR for operators Enterprise RAG teams often treat retrieval quality as a graph-construction problem: extract more entities, more relationships, more summaries, and hope the answer appears somewhere in the resulting machinery. Clue-RAG suggests a more useful diagnosis: the failure is often not that the graph is too small, but that the system has chosen the wrong semantic unit for the job.1 ...

Threading the Needle: How GRAFT Reinvents Document Translation with DAGs and LLM Agents

TL;DR for operators Long-document translation does not fail only because the model lacks enough tokens. It fails because documents are not bags of sentences. They contain references, implied pronouns, repeated terms, topic shifts, callbacks, causal links, and the occasional sentence that makes sense only because something three paragraphs earlier did the heavy lifting. ...

From Prompting to Porting: Surviving the LLM Upgrade Cycle

TL;DR for operators A model upgrade is not a software patch. It is closer to changing the interpreter under a production system while hoping every old script still means the same thing. Charming, in the way live wires are charming. The paper behind this article, Prompt Migration: Stabilizing GenAI Applications with Evolving Large Language Models, studies that problem through Tursio, an enterprise search application that converts natural-language questions into structured operator trees for database querying.1 Tursio’s old prompts were fully stable on GPT-4-32k. When the same prompts were run against GPT-4.1, tests passed at 98%. Against GPT-4.5-preview, they passed at 97.3%. That sounds minor until the application is generating SQL-like structures, where “almost correct” is not a governance model. ...

The Phantom Menace in Your Knowledge Base

TL;DR for operators The paper’s core warning is simple: a RAG system may not be reading the same document your employee just approved. A PDF, HTML page, or DOCX file can look clean to a human reviewer while carrying hidden text, altered Unicode, poisoned fonts, or layout tricks that a document loader still extracts. ...

Backtrack to the Future: How ASTRO Teaches LLMs to Think Like Search Algorithms

TL;DR for operators ASTRO is not another paper saying “make the model think longer” and then acting surprised when token bills become a lifestyle choice. It is more specific: the authors train a non-reasoner Llama model to imitate the procedure of search. The model is taught to explore a wrong path, notice uncertainty, backtrack, and continue from an earlier step — all inside one generated answer. ...