AI Agents

Agents Under Siege: How LLM Workflows Invite a New Breed of Cyber Threats

TL;DR for operators A support agent reads a customer email. It checks a CRM record. It calls a refund API. It writes a note into long-term memory. It asks another agent to verify policy. Somewhere in that chain, a malicious instruction hides inside a message, document, issue tracker entry, retrieved snippet, schema, or tool response. The model does not need to become “evil”. It only needs to be helpful in the wrong direction. ...

Playing with Strangers: A New Benchmark for Ad-Hoc Human-AI Teamwork

TL;DR for operators Teamwork is the awkward part of agentic AI. It is easy to show a model completing a task when the environment is clean, the instructions are explicit, and the other “teammates” behave exactly as expected. Real deployments are less polite. Humans omit context, follow local conventions, adapt unevenly, and occasionally do something that looks wrong only because the system has misunderstood the room. ...

Innovation, Agentified: How TRIZ Got Its AI Makeover

TL;DR for operators A crane is a useful place to test agentic innovation because the problem is painfully concrete: move heavy loads faster, avoid dangerous swinging, prevent overheating, and do not accidentally turn productivity into an incident report. The paper behind TRIZ Agents uses exactly this kind of gantry-crane improvement problem to test whether a multi-agent LLM system can follow the TRIZ method and produce plausible engineering ideas.1 ...

Thinking Inside the Gameboard: Evaluating LLM Reasoning Step-by-Step

TL;DR for operators Most AI evaluations still ask the wrongly narrow question: did the model get the answer right? That is useful, but it is not enough when the model is expected to act as an agent, revise plans, obey constraints, and recover from failure without turning the workflow into a procedural bonfire. ...

Mind Over Modules: How Smart Agents Learn What to See—and What to Be

TL;DR for operators Agentic AI is not only a model-selection problem. It is an environment-design problem. Two recent papers make that point from opposite ends of the stack. One studies LLM agents in a controlled repeated routing game and shows that the way history, rewards, and peer actions are represented can significantly change behaviour.1 The other proposes SwarmAgentic, a framework that automatically generates and optimises agent roles, execution policies, and collaboration structures using a language-based version of particle swarm optimisation.2 ...

Good Bot, Bad Reward: Fixing Feedback Loops in Vision-Language Reasoning

TL;DR for operators The useful lesson is not that vision-language models need longer reasoning traces. They already produce plenty of words. Some of them are even adjacent to thought. The useful lesson is that multimodal systems need feedback that can tell where a reasoning path breaks, not merely whether the final answer looks acceptable. ...

From Ballots to Bots: Reprogramming Democracy for the AI Era

TL;DR for operators AI political agents are best understood as a bandwidth upgrade for democratic participation, not as chrome-plated replacements for elected officials. The serious idea is not “let a chatbot run parliament”, which would be a fine way to make bad governance both faster and more confidently worded. The serious idea is that citizens, communities, and institutions may use AI delegates to process policy information, model preferences, negotiate trade-offs, and keep a continuous audit trail of representation. ...

The Memory Advantage: When AI Agents Learn from the Past

TL;DR for operators Memory is usually sold as a comfort feature for AI agents: the assistant remembers your preferences, your workflow, your charming habit of naming files final_final_v7. Fine. But operationally, memory matters less as storage and more as control. The hard question is not whether an agent can remember. It is whether the agent knows when a remembered episode should override fresh exploration. ...

Plans Before Action: What XAgent Can Learn from Pre-Act's Cognitive Blueprint

TL;DR for operators Pre-Act is a useful reminder that enterprise agents do not fail only because they choose the wrong tool. They fail because they lose the plot. A customer asks for help, the agent gathers one fact, calls one API, sees an unexpected result, and then behaves as if the workflow has reset. Charming, in the same way a lift that forgets floors is charming. ...

From Cog to Colony: Why the AI Taxonomy Matters

TL;DR for operators Most organisations do not need “Agentic AI” because it sounds more advanced. They need the smallest reliable architecture that can complete the job without creating a private zoo of semi-autonomous software creatures. The paper behind this article, AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges, argues that AI Agents and Agentic AI are not interchangeable labels.1 An AI Agent is usually a bounded system: it interprets a task, calls tools, uses context, and produces an action or output. Agentic AI is a broader system pattern: multiple specialised agents coordinate, share memory, decompose goals, recover from failures, and work toward higher-level objectives. ...