Agentic AI

When Agents Loop: Geometry, Drift, and the Hidden Physics of LLM Behavior

Agents are rarely dangerous because they answer once. They become interesting, and occasionally annoying, when they loop. A customer-support agent drafts a reply, critiques it, revises it, checks policy, rewrites the tone, and sends the result back into another reasoning step. A research agent summarizes papers, updates its plan, searches again, and revises its own assumptions. A coding agent edits a file, reads the error, patches the patch, and keeps going until either the tests pass or the repository looks like an archaeological site. ...

Agents on the Assembly Line: How Production-Grade AI Workflows Actually Get Built

Assembly lines are not exciting because every worker improvises. They are useful because each station does a narrow job, hands the result forward, and leaves as little room as possible for charming chaos. That is also the quiet lesson in A Practical Guide for Designing, Developing, and Deploying Production-Grade Agentic AI Workflows.1 The paper looks, at first glance, like another guide to agents, tools, MCP servers, multi-model reasoning, and cloud-native deployment. The tempting summary would be: “Here are nine best practices for building agentic AI.” ...

Bits, Bets, and Budgets: When Agents Should Walk Away

Budget is not an afterthought Budget is usually treated as the boring part of agent design. The exciting part is the agent: planning, calling tools, trying strategies, revising itself, and occasionally behaving like a junior analyst who has discovered both confidence and the corporate credit card. But in real automation, budget is not boring. Budget is the boundary between useful autonomy and expensive wandering. ...

Context Is King: How Ontologies Turn Agentic AI from Guesswork to Governance

A server goes down. Not a poetic metaphor. An actual server. In the paper’s SAP scenario, Server 003 is offline. At first, this sounds like a routine IT incident: check connectivity, inspect logs, restart services, escalate if necessary. The sort of answer a general LLM can produce in tidy bullet points before congratulating itself for being helpful. The problem is that the server is not just “a server.” It runs the LE-DEL module for Logistics Execution — Delivery and Returns. Its failure brings down Dispatching Bay 17. The bay handles high-value shipments. In one prompt variant, downtime can cost $2.4 million in three hours. In another, chemical product containers may pile up against regulatory limits. ...

STRIDE Gets a Plus-One: How ASTRIDE Rewrites Threat Modeling for the Agentic Era

Diagram reviews are where many security problems first become visible. Not in the production logs. Not in the postmortem. Not after a user discovers that a tool-calling agent has confidently pushed private data into the wrong API. The humble architecture diagram is supposed to be the place where adults in the room ask: what can go wrong here? ...

Scale Fail: How Downsampling Becomes an Adversarial Backdoor for VLMs

Scale Fail: How Downsampling Becomes an Adversarial Backdoor for VLMs Resize. It is one of those engineering verbs that sounds too boring to threaten anyone. A user uploads a screenshot, invoice, inspection photo, interface capture, medical form, or product image. The system resizes it. The model reads it. The workflow moves on. ...

Scan, Plan, Report: When Agentic AI Starts Thinking Like a Radiologist

Scan, Plan, Report: When Agentic AI Starts Thinking Like a Radiologist Report writing is the visible part of radiology. It is also the part easiest for AI vendors to misunderstand. A radiology report looks like text, so the naive automation pitch is obvious: give the CT scan to a vision-language model, ask for a report, and let the model type faster than a human. Congratulations, we have reinvented autocomplete with more liability. ...

Stuck on Repeat: Why LLMs Reinforce Their Own Bad Ideas

Meetings have a familiar failure mode. Someone states an early opinion, then spends the next thirty minutes “thinking through the issue” in a way that somehow makes the original opinion look increasingly inevitable. Evidence enters the room. Counterarguments are acknowledged. The conclusion remains suspiciously loyal to the opening bid. Apparently, large language models have been attending the same meetings. ...

Stock, Shock, and Two Smoking Agents: Why Inventory Needs an Autopilot

A shelf goes empty. A buyer blames the forecast. The forecast blames the promotion calendar. The warehouse blames the supplier. The supplier blames the port, the weather, or, if creativity is running low, “unexpected demand.” This little theatre is familiar because inventory failure is rarely one failure. It is a chain reaction. A SKU is not replenished too late simply because someone forgot to click “order.” It is replenished too late because demand sensing, stock monitoring, supplier reliability, lead-time uncertainty, product perishability, warehouse capacity, and purchasing authority are usually handled by separate systems pretending they are coordinated. Very modern. Very expensive. ...

Think Fast, Act Faster: How 'Thinking-by-Doing' Is Rewiring LLM World Models

Feedback is addictive. Give an AI agent a tool, an API, a database, a browser, a simulator, or a workflow environment, and the temptation is obvious: let it keep poking the world until something works. It tries. It observes. It corrects. It tries again. Compared with a model sitting alone in a prompt box, imagining every possible transition in its head, this looks much healthier. Less hallucinated planning, more contact with reality. Very grown-up. ...