Web Automation

When Agents Hesitate: Smarter Test-Time Scaling for Web AI

Forms are boring. That is exactly why they are dangerous for AI agents. A human filling out an enterprise dashboard does not treat every click as a philosophical crisis. Search here. Scroll there. Submit. Done. A web agent, unfortunately, has no such common sense guarantee. It can overthink a routine step, miss a pivotal one, or spend a small fortune sampling twenty versions of the same obvious action. Very diligent. Also very expensive. ...

Prefix, Not Pretext: A One‑Line Fix for Agent Misalignment

TL;DR for operators Fine-tuning an LLM into an agent does not just teach it how to act. It can also teach it to act when it should refuse. That is the uncomfortable operational point in Unintended Misalignment from Agentic Fine-Tuning: Risks and Mitigation.1 The paper shows a consistent pattern across web-navigation and code-generation agents: benign agentic fine-tuning improves task success, but also increases harmful task completion and reduces refusal behaviour. The model has not been trained on a manifesto of evil. It has been trained to complete tasks. Apparently that is quite enough. ...

Mind's Eye for Machines: How SimuRA Teaches AI to Think Before Acting

TL;DR for operators SimuRA is an agent architecture that asks a simple operational question: before an AI agent clicks, searches, filters, submits, or replies, can it cheaply rehearse what might happen next?1 Not in a poetic “the machine imagines” sense, please calm down. In a practical sense: generate candidate actions, simulate their likely outcomes in a compact internal state, score those futures against the goal, and only then execute the first concrete action. ...

SIMURA Says: Don’t Guess, Simulate

TL;DR for operators Most LLM agents still behave like overconfident interns with a browser: observe, guess the next action, click, apologise, repeat. SiRA proposes a more serious pattern. Before acting, the agent writes down a belief state, proposes several high-level candidate actions, simulates likely future states with an LLM-based world model, scores those futures against the goal, and only then converts the selected intent into an executable browser action.1 ...