AI Agents

Touch Intelligence: How DigiData Trains Agents to Think with Their Fingers

Phones are where automation goes to embarrass itself. A desktop workflow can often be forced into a neat sequence: open tab, click menu, submit form, pretend the enterprise software was designed by someone who likes people. Mobile apps are less polite. They hide features behind drawers, gestures, modals, permissions, scrolling lists, bottom sheets, dark-pattern-ish confirmations, and the occasional button that looks decorative until it suddenly matters. A human user handles this with a mixture of visual attention, memory, muscle habit, and mild resentment. A mobile control agent has to do it with pixels, UI trees, and a policy that decides where the next finger should land. ...

When AI Discovers Physics: Inside the Multi-Agent Renaissance of Scientific Machine Learning

Engineering teams know this ritual too well. A promising simulation model works on one equation, collapses on the next geometry, behaves politely in the loss curve, then quietly vandalises the boundary conditions. Someone adjusts the architecture. Someone changes the sampling schedule. Someone adds a physics-informed loss term. Someone discovers, three days later, that the clever idea was mostly a tensor-shape bug wearing a lab coat. ...

Thinking Fast and Flowing Slow: Real-Time Reasoning for Autonomous Agents

Delay is not a footnote in automation. It is the product. A customer support agent that takes thirty seconds to decide whether to escalate has already shaped the customer’s mood. A warehouse robot that produces the correct plan after the pallet has moved has produced something closer to poetry than control. A trading assistant that generates a gorgeous hedge after the market has repriced is not sophisticated. It is late, which is the expensive version of wrong. ...

When Ambiguity Helps: Rethinking How AI Interprets Our Data Questions

A manager asks the analytics copilot, “Which regions are underperforming this quarter?” This sounds like a normal business question. It is also, technically, a small swamp. Which regions? Sales regions, operating regions, logistics regions, or customer billing regions? Underperforming against what: forecast, last quarter, budget, peers, margin, revenue, retention, or some executive’s private sense of disappointment? And “this quarter” may mean calendar quarter, fiscal quarter, quarter-to-date, or the latest complete quarter if the finance team has not closed the books yet. ...

Agents on the Clock: How TPS-Bench Exposes the Time Management Problem in AI

A competent assistant can make a list. A useful assistant knows what must happen first. That distinction sounds small until an AI agent is asked to do something ordinary and annoyingly realistic: check a calendar, search the web, compare options, use a map, assemble a recommendation, and perhaps create a document at the end. None of those steps is exotic. The difficulty is that some of them can run in parallel, some must wait for earlier results, and some become nonsense if executed too early. This is less “genius at work” than “junior operations manager with access to too many browser tabs.” Naturally, it is where things get interesting. ...

When the Sandbox Thinks Back: Training AI Agents in Simulated Realities

Workflow software has a deeply unglamorous problem: reality keeps changing. A customer support agent may know the refund policy, but then the customer changes their address, the order record has a missing field, the tool returns a cryptic error, and the next API call requires a schema nobody mentioned in the demo. A spreadsheet agent may know how to summarise a table, but the file path is wrong, the calendar has a conflicting event, and the “obvious” action fails because the world, in its charmingly vindictive way, is not a benchmark prompt. ...

Breaking the Tempo: How TempoBench Reframes AI’s Struggle with Time and Causality

A failed deployment usually produces two questions. The first is easy enough to ask: what happened? The second is where the room goes quiet: what actually caused it? Most AI systems are now quite comfortable with the first question. Give them logs, traces, workflows, tool calls, or transition histories, and they can often produce a plausible reconstruction. They can narrate the incident in confident sequence. They can point to every condition that was present. They can provide a tidy post-mortem, ideally before the humans have finished opening the dashboard. ...

Fine-Tuning Without Fine-Tuning: How Fints Reinvents Personalization at Inference Time

Memory is a useful product feature until it becomes a junk drawer. That is the quiet problem behind many “personalized” AI systems. A user has a history. The system retrieves some of it. The model receives a longer prompt. The output becomes, in theory, more personal. In practice, the assistant often behaves like someone who read your old emails in a hurry and decided this was the same as knowing you. ...

The Agent Olympics: How Toolathlon Tests the Limits of AI Workflows

Office work is not one task. It is a chain of small obligations pretending to be one task. “Check the homework submissions, download the attached Python files, run them, grade the students in Canvas, and use the latest submission if someone sent more than one.” That sounds like a normal administrative request. It is also a compact torture device for an AI agent. The agent must read email, handle attachments, inspect local files, run code, interpret results, map students to course records, update Canvas, and not confidently grade the wrong person. Easy, apparently, as long as nothing has to actually work. ...

Two Minds in One Machine: How Agentic AI Splits—and Reunites—the Field

Agents have become the new office intern, software engineer, analyst, compliance assistant, and occasional disaster rehearsal all in one. Give one a goal, some tools, a memory store, and permission to act, and it begins to look less like a chatbot and more like a small operating unit. That is the sales pitch. The engineering reality is less tidy. ...