AI Agents

Stop Signs Are Not Steering Wheels: TRIAD and the Case for Repairable Agent Guardrails

TL;DR for operators Most agent guardrails behave like stop signs. They inspect a proposed action, decide whether it looks safe, and then allow or block execution. This is neat, legible, and often operationally clumsy. Real agent failures are not always cleanly harmful from the first word. A useful business request can be contaminated by a prompt injection, a malicious tool response, or an unsafe intermediate plan. Blocking the whole task may reduce risk, but it also throws away the legitimate work. Excellent safety theatre, less excellent operations. ...

Graph Work, Not Graph Worship: RAGA Turns RAG Into an Auditable Knowledge Operation

TL;DR for operators RAGA is not another “add a graph and accuracy goes up” paper. That would be too convenient, and therefore suspicious. The useful idea is more operational: treat retrieval-augmented generation as a knowledge management process, not a pile of embeddings with a polite chatbot on top. The paper proposes RAGA, short for Reading-And-Graph-building-Agent, an autonomous system that reads documents, searches existing graph knowledge, verifies whether new entities or relations should be added, and then constructs or updates a knowledge graph with source-linked provenance.1 Its core loop is Read–Search–Verify–Construct, implemented as a ReAct-style tool-calling agent rather than a one-shot extraction pipeline. ...

The Goats in the Machine: Why AI Agents Need Contracts, Not Personalities

TL;DR for operators AI agents are leaving the demo booth and entering workspaces: repositories, customer records, procurement systems, legal drafts, financial workflows, support queues, and other places where a charming mistake becomes an operational incident. That changes the evaluation problem. It is no longer enough to ask whether an agent sounds sensible, acts “empathetic”, appears to “understand”, or seems to have “judgement”. Lovely theatre. Terrible control surface. ...

Memory Foam: When AI Stops Storing Everything and Starts Learning From It

Enterprise AI has developed a small obsession with memory. The promise is tidy: give the model more context, attach a vector database, retrieve relevant fragments, and suddenly the system becomes a persistent assistant rather than a forgetful autocomplete machine wearing a blazer. The problem is that storage is not memory. Retrieval is not understanding. And a larger context window is not the same thing as knowing what matters. ...

Copy Less, Catch More: The Minimal Surface Rule for Production AI

Copy Less, Catch More: The Minimal Surface Rule for Production AI Production AI has a slightly embarrassing habit: the more intelligent the system becomes, the more basic the bottleneck starts to look. A coding agent may reason beautifully, then spend its useful life waiting for a sandbox to roll back after one bad command. A model marketplace may offer thousands of “ready-to-deploy” neural networks, then make security review so expensive that nobody checks enough of them. Apparently the future of AI can be blocked by file copies and audit queues. Very glamorous. ...

Cache Me If You Can: Why Enterprise AI Needs Latent Working Memory

A codebase is not a paragraph. Neither is a litigation folder, a clinical case file, a customer-support history, a policy archive, or the slow-motion disaster known as “all meeting notes since March.” Yet many enterprise AI systems still treat long context as a heroic prompt-engineering problem: push more text into the model, pray the key detail survives attention, and call the bill “innovation.” ...

Picture This: When AI Reasoning Leaves the Text Box

Reasoning usually arrives as text. A model explains itself in sentences, equations, bullet points, and the occasional theatrical “therefore.” We have learned to call this chain-of-thought, or CoT, because “the model wrote a long scratchpad and we hope it helped” sounded insufficiently scientific. The paper Optical Reasoning: Rethinking Images as an Expressive Reasoning Medium Beyond Text asks a sharper question: what if the intermediate reasoning medium does not have to be text at all?1 ...

Pixels to Purchase Orders: A Business Map for Choosing Vision-Language Models

Pixels to Purchase Orders: A Business Map for Choosing Vision-Language Models Receipts are a good way to ruin an AI demo. A clean product photo is polite. A scanned receipt is not. It has shadows, folds, strange fonts, tiny numbers, merchant abbreviations, table-like structure, and one suspiciously important total amount hiding near the bottom. Ask a generic multimodal assistant what it sees, and it may produce an answer that sounds fluent enough to make everyone in the meeting relax. That is usually the dangerous part. ...

Wrong on Purpose: FalsifyBench and the Agent Skill We Keep Forgetting

A good analyst should occasionally try to break their own idea. Not performatively. Not with a decorative “on the other hand” paragraph. Actually break it. Ask the kind of question that could make the current hypothesis collapse, then watch whether the evidence forces a better one. That simple discipline is the center of FalsifyBench: Evaluating Inductive Reasoning in LLMs with Rule Discovery Games, a new paper by Leonardo Bertolazzi, Katya Tentori, and Raffaella Bernardi.1 The paper is framed around scientific reasoning, but its practical message travels well beyond science. If an AI agent cannot test outside its own current belief, it may look careful while doing something much less impressive: confirming the first plausible story it invented. ...

Look Before You Think: Why Visual AI Needs Evidence Scheduling

A visual AI system can fail in a very boring way: it sounds confident, answers fluently, and quietly forgets to look. That is more dangerous than a spectacular hallucination. A spectacular hallucination at least waves a red flag. The boring version looks like normal enterprise automation: an insurance claim assessment, a warehouse inspection report, a medical-image triage note, a construction progress summary, a product-quality explanation. The system has an image. It has a question. It produces an answer. Somewhere inside the model, language did most of the work and vision became decorative evidence. Very modern. Very polished. Very capable of being wrong. ...