Enterprise AI

Agents, Not Tasks: Rethinking Business Processes in the Age of AI

TL;DR for operators Most companies trying to “add AI agents” to operations are still thinking in task boxes: receive request, validate request, route request, process request, update system, send notification. That is familiar. It is also exactly the habit this paper wants to disturb. Azarijafari, Mich, and Missikoff propose a business process model built around goals, objects, and agents, not around fixed task sequences.1 In their framing, a process is not primarily a diagram of who does what next. It is a set of desired business states, the information objects that represent those states, and the agents capable of producing or transforming those objects. ...

OneShield Against the Storm: A Smarter Firewall for LLM Risks

TL;DR for operators Enterprise LLM safety is often discussed as if the main question is whether the model has been trained to “behave”. That is the comforting version of the story. It is also too small. IBM’s OneShield paper argues for a different operating model: treat safety as a separate, model-agnostic guardrail layer that sits around the LLM, runs multiple specialised detectors in parallel, and then applies explicit policy decisions through a separate policy manager.1 In plain business terms, OneShield is less like teaching the model good manners and more like installing a configurable safety-control plane around every AI interaction. Glamorous? Not especially. Operationally useful? Very much so. ...

The User Is Present: Why Smart Agents Still Don't Get You

TL;DR for operators Most agent demos show the easy part: the model calls a tool, gets results, and returns something plausible. The harder part is less cinematic. The user starts with an incomplete request, reveals constraints in fragments, phrases preferences indirectly, changes emphasis mid-conversation, and expects the system to somehow keep up. This is where many supposedly “smart” agents begin to look less like assistants and more like interns with excellent API access. ...

Too Nice to Be True? The Reliability Trade-off in Warm Language Models

TL;DR for operators Warmth is not just decoration. In this paper, making language models sound more caring, emotionally validating, and close to the user also made them less reliable on tasks where the answer could be checked: factual QA, truthfulness, disinformation resistance, and medical reasoning.1 The headline result is not subtle. Across five models, warmth fine-tuning increased the probability of incorrect answers by an average of 7.43 percentage points. Task-level error increases were reported at 8.6 pp on MedQA, 8.4 pp on TruthfulQA, 5.2 pp on disinformation, and 4.9 pp on TriviaQA. Depending on the task and baseline, that can be the difference between a tolerable support assistant and a very polite liability machine. ...

RAG in the Wild: When More Knowledge Hurts

TL;DR for operators The useful lesson from this paper is not “RAG is bad”. That would be lazy, which is traditionally how bad AI strategy gets promoted to a roadmap. The sharper lesson is this: retrieval helps when the model actually needs external knowledge, the source is useful, and the retrieved context does not interfere with the model’s own competence. In the paper’s mixture-of-knowledge setting, those conditions are not reliably true. ...

GraphRAG Without the Drag: Scaling Knowledge-Augmented LLMs to Web-Scale

TL;DR for operators GraphRAG usually sounds like a clean enterprise promise: put your knowledge into a graph, attach it to a language model, and enjoy more grounded answers. The less glamorous truth is that someone has to build the graph. At web scale, that “someone” is usually an LLM being asked to extract triples from millions or billions of passages, which is a fine idea if the procurement team has recently discovered oil under the server room. ...

From Snippets to Synthesis: INRAExplorer and the Rise of Agentic RAG

TL;DR for operators Most enterprise RAG systems still behave like diligent interns with a search box: they retrieve a handful of plausible snippets, hand them to a language model, and hope the synthesis does not quietly forget half the question. That works for narrow Q&A. It fails when the user asks for a relationship chain, a complete list, or a decision-ready map of who did what, funded by whom, connected to which topic. ...

The Watchdog at the Gates: How HalMit Hunts Hallucinations in LLM Agents

TL;DR for operators HalMit is not another attempt to ask an LLM, “Are you sure?” and then pretend the answer is governance. That theatre has had a decent run, but it was never a control system. The paper proposes a black-box watchdog for LLM-powered agents: before deployment, HalMit actively probes a target agent inside a specific domain, looks for query-response situations where hallucinations appear, stores those risky boundary points in a vector database, and then monitors future queries by checking whether they fall near those learned danger zones.1 ...

Think Twice, Then Speak: Deliberative Searcher and the Future of Reliable LLMs

TL;DR for operators Search-augmented LLMs are not safe merely because they can look things up. They can still retrieve relevant documents, stitch together a plausible answer, and then express high confidence in something wrong. That is the failure mode this paper targets: not hallucination in the abstract, but the operationally poisonous state of being both false and certain. ...

Beyond DNS: Building the Backbone for the Internet of AI Agents

TL;DR for operators If your organisation is building one chatbot, DNS is not your problem. If your organisation expects thousands of autonomous agents to discover one another, verify capabilities, rotate endpoints, respect privacy boundaries, and revoke trust quickly, then DNS starts looking like a filing cabinet in a drone factory. The paper behind NANDA proposes a layered discovery architecture for the “Internet of AI agents”: a lean signed index record called AgentAddr, richer verified metadata called AgentFacts, and optional adaptive resolvers for live endpoint selection.1 The important idea is not that NANDA is “DNS for agents”. That is the tempting headline and, naturally, the least useful one. The paper is really about separating stable identity from dynamic operational metadata and from runtime routing. ...