Cover image

From Wallets to Warlords: How AI Agents Are Colonizing Web3

When ChatGPT meets Ethereum, something stranger than fiction emerges: self-improving wallets, token-trading bots with personality, and agents that vote in DAOs like digital lobbyists. A recent systematic study of 133 Web3-AI agent projects has finally mapped this chaotic frontier — and the findings suggest we’re just witnessing the first skirmishes of a much bigger transformation. The Two Poles of the Web3-AI Ecosystem The paper identifies four major project categories: Category Project Count Avg Market Cap Example Projects AI Agent Incubation 56 $88M Singularity, Eliza OS Infrastructure 34 $188M NEAR, Fetch.ai Financial Services 55 $57M Nexo, Griffain, Wayfinder Creative & Virtual 28 $85M Botto, Hytopia Two clear dynamics emerge: ...

August 6, 2025 · 4 min · Zelina
Cover image

Add to Cart, Add to Power: What Happens When AI Shops for You

When humans stop shopping and AI takes over, the cart becomes a new battleground. A recent study titled “What Is Your AI Agent Buying?” introduces a benchmark framework called ACES to simulate AI-mediated e-commerce environments, and the results are far more consequential than a simple switch from user clicks to agent decisions. The ACES Sandbox: Agentic E-Commerce Under the Microscope ACES (Agentic e-Commerce Simulator) offers a controlled environment that pairs state-of-the-art vision-language-model (VLM) agents with a mock shopping website. This setup enables causal measurement of how different product attributes (price, rating, reviews) and platform levers (position, tags, sponsorship) influence agentic decision-making. ...

August 5, 2025 · 4 min · Zelina
Cover image

Beyond DNS: Building the Backbone for the Internet of AI Agents

Imagine a future where autonomous AI agents don’t just assist us — they negotiate, orchestrate, and execute decisions across digital and physical realms in milliseconds. Now imagine trying to route, authenticate, and audit these trillions of agents using a system designed for 1980s-era websites. That’s the conundrum the creators of the NANDA index are confronting head-on. The paper, Beyond DNS: Unlocking the Internet of AI Agents via the NANDA Index and Verified AgentFacts, presents a bold infrastructure vision that goes far beyond anything like DNS, HTTPS, or traditional service registries. Instead, it proposes a lean yet powerful framework for agent discovery, authentication, routing, and governance. The implications? A new kind of internet, tailored for machine-native, privacy-preserving, trust-aware autonomy. ...

July 22, 2025 · 4 min · Zelina
Cover image

Truth, Beauty, Justice, and the Data Scientist’s Dilemma

As AI systems become more capable of automating every stage of the data science workflow—from formulating hypotheses to summarizing results—it might seem we’re inching toward a world where “data scientist” becomes just another automated job title. But Timpone and Yang’s new framework, presented in their paper AI, Humans, and Data Science (2025), offers a powerful antidote to this narrative: a structured way to evaluate where humans are indispensable—not by resisting automation, but by rethinking our roles within it. ...

July 17, 2025 · 3 min · Zelina
Cover image

Inner Critics, Better Agents: The Rise of Introspective AI

When AI agents begin to talk to themselves—really talk to themselves—we might just witness a shift in how machine reasoning is conceived. A new paper, “Introspection of Thought Helps AI Agents”, proposes a reasoning framework (INoT) that takes inspiration not from more advanced outputs or faster APIs, but from an old philosophical skill: inner reflection. Rather than chaining external prompts or simulating collaborative agents outside the model, INoT introduces PromptCode—a code-integrated prompt system that embeds a virtual multi-agent debate directly inside the LLM. The result? A substantial increase in reasoning quality (average +7.95%) and a dramatic reduction in token cost (–58.3%) compared to state-of-the-art baselines. Let’s unpack how this works, and why it could redefine our mental model of what it means for an LLM to “think.” ...

July 14, 2025 · 4 min · Zelina
Cover image

The Rise of the Self-Evolving Scientist: STELLA and the Future of Biomedical AI

When was the last time a machine truly surprised you—not with a quirky ChatGPT poem or a clever image generation, but with scientific reasoning that evolved on its own? Meet STELLA, an AI agent for biomedical research that doesn’t just solve problems—it gets better at solving them while solving them. The Static Curse of Smart Agents Modern AI agents have shown promise in navigating the labyrinth of biomedical research, where each inquiry might require cross-referencing papers, running custom bioinformatics analyses, or interrogating molecular databases. But the vast majority of these agents suffer from a fatal limitation: they rely on static, pre-installed toolkits and hard-coded logic trees. Like a PhD student who memorized a textbook but never updated it, they can’t adapt to new tasks or new knowledge without human intervention. ...

July 13, 2025 · 3 min · Zelina
Cover image

Passing Humanity's Last Exam: X-Master and the Emergence of Scientific AI Agents

Is it possible to train a language model to become a capable scientist? That provocative question lies at the heart of a new milestone in AI research. In SciMaster: Towards General-Purpose Scientific AI Agents, a team from Shanghai Jiao Tong University introduces X-Master, a tool-augmented open-source agent that has just achieved the highest score ever recorded on Humanity’s Last Exam (HLE)—surpassing even OpenAI and Google. But what makes this feat more than just a leaderboard update is how X-Master got there. Instead of training a larger model or fine-tuning on more data, the researchers innovated on agentic architecture and inference-time workflows. The result? An extensible framework that emulates the exploratory behavior of human scientists, not just their answers. ...

July 8, 2025 · 4 min · Zelina
Cover image

Ping, Probe, Prompt: Teaching AI to Troubleshoot Networks Like a Pro

When a network fails, it doesn’t whisper its problems—it screams in silence. Packet drops, congestion, and flapping links rarely announce themselves clearly. Engineers must piece together clues scattered across logs, dashboards, and telemetry. It’s a detective game where the evidence hides behind obscure port counters and real-time topological chaos. Now imagine handing this job to a Large Language Model. That’s the bold challenge taken up by researchers in “Towards a Playground to Democratize Experimentation and Benchmarking of AI Agents for Network Troubleshooting”. They don’t just propose letting LLMs debug networks—they build an entire sandbox where AI agents can learn, act, and be judged on their troubleshooting skills. It’s not theory. It’s a working proof-of-concept. ...

July 6, 2025 · 4 min · Zelina
Cover image

Brains with Gradients: Why Energy-Based Transformers Might Be the Future of Thinking Machines

Brains with Gradients: Why Energy-Based Transformers Might Be the Future of Thinking Machines AI models are getting better at mimicking human intuition (System 1), but what about deliberate reasoning—slow, careful System 2 Thinking? Until now, most methods required supervision (e.g., reward models, verifiers, or chain-of-thought engineering). A new architecture, Energy-Based Transformers (EBTs), changes that. It offers a radically unsupervised, architecture-level path toward models that “think,” not just react. The implications for robust generalization, dynamic reasoning, and agent-based autonomy are profound. ...

July 4, 2025 · 3 min · Zelina
Cover image

Mind the Gap: Fixing the Flaws in Agentic Benchmarking

If you’ve looked at any leaderboard lately—from SWE-Bench to WebArena—you’ve probably seen impressive numbers. But how many of those reflect real capabilities of AI agents? This paper by Zhu et al. makes a bold claim: agentic benchmarks are often broken, and the way we evaluate AI agents is riddled with systemic flaws. Their response is refreshingly practical: a 33-point diagnostic called the Agentic Benchmark Checklist (ABC), designed not just to critique, but to fix the evaluation process. It’s a must-read not only for benchmark creators, but for any team serious about deploying or comparing AI agents in real-world tasks. ...

July 4, 2025 · 5 min · Zelina