Cover image

The Memory Illusion: Why AI Still Forgets Who It Is

Opening — Why this matters now Every AI company wants its assistant to feel personal. Yet every conversation starts from zero. Your favorite chatbot may recall facts, summarize documents, even mimic a tone — but beneath the fluent words, it suffers from a peculiar amnesia. It remembers nothing unless reminded, apologizes often, and contradicts itself with unsettling confidence. The question emerging from Stefano Natangelo’s “Narrative Continuity Test (NCT)” is both philosophical and practical: Can an AI remain the same someone across time? ...

November 3, 2025 · 4 min · Zelina
Cover image

Two Minds in One Machine: How Agentic AI Splits—and Reunites—the Field

Opening — Why this matters now Agentic AI is the latest obsession in artificial intelligence: systems that don’t just respond but decide. They plan, delegate, and act—sometimes without asking for permission. Yet as hype grows, confusion spreads. Many conflate these new multi-agent architectures with the old, symbolic dream of reasoning machines from the 1980s. The result? Conceptual chaos. A recent comprehensive survey—Agentic AI: A Comprehensive Survey of Architectures, Applications, and Future Directions—cuts through the noise. It argues that today’s agentic systems are not the heirs of symbolic AI but the offspring of neural, generative models. In other words: we’ve been speaking two dialects of intelligence without realizing it. ...

November 3, 2025 · 4 min · Zelina
Cover image

Who Really Runs the Workflow? Ranking Agent Influence in Multi-Agent AI Systems

Opening — Why this matters now Multi-agent systems — the so-called Agentic AI Workflows — are rapidly becoming the skeleton of enterprise-grade automation. They promise autonomy, composability, and scalability. But beneath this elegant choreography lies a governance nightmare: we often have no idea which agent is actually in charge. Imagine a digital factory of LLMs: one drafts code, another critiques it, a third summarizes results, and a fourth audits everything. When something goes wrong — toxic content, hallucinated outputs, or runaway costs — who do you blame? More importantly, which agent do you fix? ...

November 3, 2025 · 5 min · Zelina
Cover image

Bias on Demand: When Synthetic Data Exposes the Moral Logic of AI Fairness

Bias on Demand: When Synthetic Data Exposes the Moral Logic of AI Fairness In the field of machine learning, fairness is often treated as a technical constraint — a line of code to be added, a metric to be optimized. But behind every fairness metric lies a moral stance: what should be equalized, for whom, and at what cost? The paper “Bias on Demand: A Modelling Framework that Generates Synthetic Data with Bias” (Baumann et al., FAccT 2023) breaks this technical illusion by offering a framework that can manufacture bias in data — deliberately, transparently, and with philosophical intent. ...

November 2, 2025 · 4 min · Zelina
Cover image

From Prototype to Profit: How IBM's CUGA Redefines Enterprise Agents

When AI agents first emerged as academic curiosities, they promised a future of autonomous systems capable of navigating apps, websites, and APIs as deftly as humans. Yet most of these experiments never left the lab. The jump from benchmark to boardroom—the point where AI must meet service-level agreements, governance rules, and cost-performance constraints—remained elusive. IBM’s recent paper, From Benchmarks to Business Impact, finally brings data to that missing bridge. The Benchmark Trap Generalist agents such as AutoGen, LangGraph, and Operator have dazzled the research community with their ability to orchestrate tasks across multiple tools. But academic triumphs often hide operational fragility. Benchmarks like AppWorld or WebArena measure intelligence; enterprises measure ROI. They need systems that are reproducible, auditable, and policy-compliant—not just clever. ...

November 2, 2025 · 4 min · Zelina
Cover image

Recursive Minds: How ReCAP Turns LLMs into Self-Correcting Planners

In long-horizon reasoning, large language models still behave like short-term thinkers. They can plan, but only in a straight line. Once the context window overflows, earlier intentions vanish, and the model forgets why it started. The new framework ReCAP (Recursive Context-Aware Reasoning and Planning)—from Stanford’s Computer Science Department and MIT Media Lab—offers a radical solution: give LLMs a recursive memory of their own reasoning. The Problem: Context Drift and Hierarchical Amnesia Sequential prompting—used in CoT, ReAct, and Reflexion—forces models to reason step by step along a linear chain. But in complex, multi-stage tasks (say, cooking or coding), early goals slide out of the window. Once the model’s focus shifts to later steps, earlier plans are irretrievable. Hierarchical prompting tries to fix this by spawning subtasks, but it often fragments information across layers—each sub-agent loses sight of the global goal. ...

November 2, 2025 · 4 min · Zelina
Cover image

The Esperanto of AI Agents: How the Agent Data Protocol Unifies a Fragmented Ecosystem

The Problem of Fragmented Agent Intelligence Building large language model (LLM) agents has long been haunted by a quiet paradox. Despite a growing number of agent datasets—from web navigation to software engineering—researchers rarely fine-tune their models across these diverse sources. The reason is not a shortage of data, but a lack of coherence: every dataset speaks its own dialect. One uses HTML trees; another records API calls; a third logs terminal sessions. Converting them all for fine-tuning an agent is a nightmare of custom scripts, mismatched schemas, and endless validation. ...

November 2, 2025 · 4 min · Zelina
Cover image

The Missing Metric: Measuring Agentic Potential Before It’s Too Late

The Missing Metric: Measuring Agentic Potential Before It’s Too Late In the modern AI landscape, models are not just talkers—they are becoming doers. They code, browse, research, and act within complex environments. Yet, while we’ve become adept at measuring what models know, we still lack a clear way to measure what they can become. APTBench, proposed by Tencent Youtu Lab and Shanghai Jiao Tong University, fills that gap: it’s the first benchmark designed to quantify a model’s agentic potential during pre-training—before costly fine-tuning or instruction stages even begin. ...

November 2, 2025 · 4 min · Zelina
Cover image

When Agents Learn to Test Themselves: TDFlow and the Future of Software Engineering

From Coding to Testing: The Shift in Focus TDFlow, developed by researchers at Carnegie Mellon, UC San Diego, and Johns Hopkins, presents a provocative twist on how we think about AI-driven software engineering. Instead of treating the large language model (LLM) as a creative coder, TDFlow frames the entire process as a test-resolution problem—where the agent’s goal is not to write elegant code, but simply to make the tests pass. ...

November 2, 2025 · 5 min · Zelina
Cover image

When Rules Go Live: Policy Cards and the New Language of AI Governance

When Rules Go Live: Policy Cards and the New Language of AI Governance In 2019, Model Cards made AI systems more transparent by documenting what they were trained to do. Then came Data Cards and System Cards, clarifying how datasets and end-to-end systems behave. But as AI moves from prediction to action—from chatbots to trading agents, surgical robots, and autonomous research assistants—documentation is no longer enough. We need artifacts that don’t just describe a system, but govern it. ...

November 2, 2025 · 4 min · Zelina