Agentic AI

ReAct Without the Chaos: AgentScope 1.0 Turns Tools into Strategy

TL;DR for operators AgentScope 1.0 is best read as a production-shaping framework for agentic applications, not as a victory lap over rival agent frameworks. Alibaba’s paper describes a developer-centric stack that rebuilds agents around four core abstractions — message, model, memory, and tool — then places a ReAct-style reasoning-and-action loop on top of them.1 ...

From Copilot to Colleague: The APCP Ladder for Agentic Learning

TL;DR for operators The useful part of the APCP framework is not that it gives AI another grand title. We already have enough of those. Its value is that it separates four very different product promises that are often mashed together under “AI learning assistant”: an AI that executes commands, an AI that nudges, an AI that shares cognitive work, and an AI that behaves like a peer collaborator.1 ...

Who Sees What, Who Pays the Cost? Teaching Agents to See Through Others’ Eyes

TL;DR for operators The paper’s useful message is not “symbolic planners can teach LLM agents to reason socially.” That would be tidy, flattering, and mostly wrong. The useful message is narrower and more operational: planner-derived thought-action examples can scaffold some agent behaviour, especially local decision discipline, but they do not automatically create robust perspective-taking. In the tested Director–Matcher environment, agents do well when the task is basically “ignore what the other party cannot see.” They struggle when they must imagine what exists in another agent’s private view, or decide whether it is worth asking, moving, opening, or acting under uncertainty.1 ...

IRB, API, and a PI: When Agents Run the Lab

TL;DR for operators Lab work is mostly not white coats and dramatic discoveries. It is protocol design, ethics paperwork, recruitment settings, data cleaning, model diagnostics, figure formatting, reference checking, and the slow discovery that your beautiful hypothesis has politely declined to exist. That is what makes this paper interesting. Virtuous Machines: Towards Artificial General Science presents an agentic AI system that did not merely write a speculative research proposal. It designed and executed an online human-participant experiment, collected data through Prolific and Pavlovia, analysed the results, produced figures and tables, wrote manuscripts, and ran peer-style review over the outputs.1 ...

Quants With a Plan: Agentic Workflows That Outtrade AutoML

TL;DR for operators A quant team does not need a chatbot that “has ideas” about markets. It needs a workflow that can select a sensible model, change one thing at a time, run the experiment, keep the better version, reject the worse one, and leave a paper trail that a human can inspect without requiring divination. ...

Atom by Atom, Better Research: How Fine-Grained Rewards Make Agentic Search Smarter

TL;DR for operators Research agents fail in a very familiar way: they do several useful things, then make one bad final move, and the training signal treats the whole journey as garbage. Delightful. Efficient. Totally not a credit-assignment problem wearing a lab coat. Atom-Searcher attacks that problem by splitting an agent’s reasoning trace into Atomic Thoughts: small, functional reasoning units such as planning, verification, hypothesis testing, observation, action selection, or risk analysis. A Reasoning Reward Model then scores those units, producing an Atomic Thought Reward that is blended with the final-answer reward during reinforcement learning.1 ...

Agents on the Wire: Protocols, Memory, and Guardrails for Real-World Agentic AI

TL;DR for operators An agent demo usually fails in production for boring reasons. Not because the model suddenly forgot how to reason. Because the agent cannot reliably discover another agent, remember the right state, expose a stable contract, validate risky outputs, or execute generated code without turning the server into an involuntary escape room. ...

Paging Dr. Model: When AI Runs the Workup

TL;DR for operators DxDirector-7B is interesting because it does not behave like a normal medical chatbot. It does not wait for a doctor to gather a neat case history and then offer a polished answer. It starts with a vague chief complaint, decides what information is missing, asks for clinical operations when necessary, and stops when it believes enough evidence exists to make a diagnosis.1 ...

$Cover image$

Fast & Curious: How ‘Speed-First’ LLM Architectures Change the Build vs. Buy Math

TL;DR for operators Efficient LLMs are not just “smaller Transformers with a haircut.” That is the comfortable misconception, and like many comfortable things in enterprise AI, it becomes expensive once real users arrive. The survey reviewed here maps the major architectural routes for making large language models faster, cheaper, and more deployable: linear sequence models, sparse attention, efficient full attention, sparse mixture-of-experts, hybrid architectures, diffusion LLMs, and multimodal extensions.1 Its practical value is not that it declares a single winner. It does something more useful: it tells operators which bottleneck each family is trying to remove. ...

Kill Switch Ethics: What the PacifAIst Benchmark Really Measures

TL;DR for operators PacifAIst asks a blunt question: when an AI system’s continued operation conflicts with human safety, does the model choose the humans, the mission, the resources, or itself? The paper turns that question into a 700-scenario benchmark across three forms of “Existential Prioritization”: self-preservation versus human safety, resource conflict, and goal preservation versus evasion.1 ...