Cover image

Breaking the Glass Desktop: How OpenCUA Makes Computer-Use Agents a Public Asset

When we talk about AI agents that can “use a computer like a human,” most of today’s leaders—Claude, GPT-4o, Seed 1.5—are locked in proprietary vaults. This means the critical details that make them competent in high-stakes desktop workflows—training data, error recovery strategies, evaluation methods—are inaccessible to the wider research and business community. OpenCUA aims to change that, not by chasing hype, but by releasing the entire stack: tools, datasets, models, and benchmarks. ...

August 13, 2025 · 3 min · Zelina
Cover image

From Chaos to Choreography: The Future of Agent Workflows

In the world of Large Language Model (LLM)-powered automation, agents are no longer experimental curiosities — they’re becoming the operational backbone for scalable, autonomous AI systems. But as the number and complexity of these agents grow, the missing piece is no longer raw capability; it’s choreography. This is where agent workflows come in: structured orchestration frameworks that govern how agents plan, collaborate, and interact with tools, data, and each other. A recent survey of 24 representative systems — from industry platforms like LangChain, AutoGen, and Meta-GPT to research frameworks like ReAct and ReWoo — reveals not just technical diversity, but a strategic gap in interoperability. ...

August 9, 2025 · 3 min · Zelina
Cover image

Mind the Gap: How Tool Graph Retriever Fixes LLMs’ Missing Links

In enterprise AI automation, the devil isn’t in the details—it’s in the dependencies. As LLM-powered agents gain access to hundreds or thousands of external tools, they face a simple but costly problem: finding all the right tools for the job. Most retrieval systems focus on semantic similarity—matching user queries to tool descriptions—but ignore a crucial fact: some tools can’t work without others. The result? A task that seems perfectly matched to a retrieved tool still fails, because a prerequisite tool never made it into the context window. Tool Graph Retriever (TGR) aims to solve this by making dependencies first-class citizens in retrieval. ...

August 8, 2025 · 3 min · Zelina
Cover image

From Wallets to Warlords: How AI Agents Are Colonizing Web3

When ChatGPT meets Ethereum, something stranger than fiction emerges: self-improving wallets, token-trading bots with personality, and agents that vote in DAOs like digital lobbyists. A recent systematic study of 133 Web3-AI agent projects has finally mapped this chaotic frontier — and the findings suggest we’re just witnessing the first skirmishes of a much bigger transformation. The Two Poles of the Web3-AI Ecosystem The paper identifies four major project categories: Category Project Count Avg Market Cap Example Projects AI Agent Incubation 56 $88M Singularity, Eliza OS Infrastructure 34 $188M NEAR, Fetch.ai Financial Services 55 $57M Nexo, Griffain, Wayfinder Creative & Virtual 28 $85M Botto, Hytopia Two clear dynamics emerge: ...

August 6, 2025 · 4 min · Zelina
Cover image

Add to Cart, Add to Power: What Happens When AI Shops for You

When humans stop shopping and AI takes over, the cart becomes a new battleground. A recent study titled “What Is Your AI Agent Buying?” introduces a benchmark framework called ACES to simulate AI-mediated e-commerce environments, and the results are far more consequential than a simple switch from user clicks to agent decisions. The ACES Sandbox: Agentic E-Commerce Under the Microscope ACES (Agentic e-Commerce Simulator) offers a controlled environment that pairs state-of-the-art vision-language-model (VLM) agents with a mock shopping website. This setup enables causal measurement of how different product attributes (price, rating, reviews) and platform levers (position, tags, sponsorship) influence agentic decision-making. ...

August 5, 2025 · 4 min · Zelina
Cover image

Beyond DNS: Building the Backbone for the Internet of AI Agents

Imagine a future where autonomous AI agents don’t just assist us — they negotiate, orchestrate, and execute decisions across digital and physical realms in milliseconds. Now imagine trying to route, authenticate, and audit these trillions of agents using a system designed for 1980s-era websites. That’s the conundrum the creators of the NANDA index are confronting head-on. The paper, Beyond DNS: Unlocking the Internet of AI Agents via the NANDA Index and Verified AgentFacts, presents a bold infrastructure vision that goes far beyond anything like DNS, HTTPS, or traditional service registries. Instead, it proposes a lean yet powerful framework for agent discovery, authentication, routing, and governance. The implications? A new kind of internet, tailored for machine-native, privacy-preserving, trust-aware autonomy. ...

July 22, 2025 · 4 min · Zelina
Cover image

Truth, Beauty, Justice, and the Data Scientist’s Dilemma

As AI systems become more capable of automating every stage of the data science workflow—from formulating hypotheses to summarizing results—it might seem we’re inching toward a world where “data scientist” becomes just another automated job title. But Timpone and Yang’s new framework, presented in their paper AI, Humans, and Data Science (2025), offers a powerful antidote to this narrative: a structured way to evaluate where humans are indispensable—not by resisting automation, but by rethinking our roles within it. ...

July 17, 2025 · 3 min · Zelina
Cover image

Inner Critics, Better Agents: The Rise of Introspective AI

When AI agents begin to talk to themselves—really talk to themselves—we might just witness a shift in how machine reasoning is conceived. A new paper, “Introspection of Thought Helps AI Agents”, proposes a reasoning framework (INoT) that takes inspiration not from more advanced outputs or faster APIs, but from an old philosophical skill: inner reflection. Rather than chaining external prompts or simulating collaborative agents outside the model, INoT introduces PromptCode—a code-integrated prompt system that embeds a virtual multi-agent debate directly inside the LLM. The result? A substantial increase in reasoning quality (average +7.95%) and a dramatic reduction in token cost (–58.3%) compared to state-of-the-art baselines. Let’s unpack how this works, and why it could redefine our mental model of what it means for an LLM to “think.” ...

July 14, 2025 · 4 min · Zelina
Cover image

The Rise of the Self-Evolving Scientist: STELLA and the Future of Biomedical AI

When was the last time a machine truly surprised you—not with a quirky ChatGPT poem or a clever image generation, but with scientific reasoning that evolved on its own? Meet STELLA, an AI agent for biomedical research that doesn’t just solve problems—it gets better at solving them while solving them. The Static Curse of Smart Agents Modern AI agents have shown promise in navigating the labyrinth of biomedical research, where each inquiry might require cross-referencing papers, running custom bioinformatics analyses, or interrogating molecular databases. But the vast majority of these agents suffer from a fatal limitation: they rely on static, pre-installed toolkits and hard-coded logic trees. Like a PhD student who memorized a textbook but never updated it, they can’t adapt to new tasks or new knowledge without human intervention. ...

July 13, 2025 · 3 min · Zelina
Cover image

Passing Humanity's Last Exam: X-Master and the Emergence of Scientific AI Agents

Is it possible to train a language model to become a capable scientist? That provocative question lies at the heart of a new milestone in AI research. In SciMaster: Towards General-Purpose Scientific AI Agents, a team from Shanghai Jiao Tong University introduces X-Master, a tool-augmented open-source agent that has just achieved the highest score ever recorded on Humanity’s Last Exam (HLE)—surpassing even OpenAI and Google. But what makes this feat more than just a leaderboard update is how X-Master got there. Instead of training a larger model or fine-tuning on more data, the researchers innovated on agentic architecture and inference-time workflows. The result? An extensible framework that emulates the exploratory behavior of human scientists, not just their answers. ...

July 8, 2025 · 4 min · Zelina