Cover image

Peer Review, But Make It Multi‑Agent: Inside aiXiv’s Bid to Publish AI Scientists

If 2024 was the year AI started writing science, 2025 is making it figure out how to publish it. Today’s paper introduces aiXiv, an open‑access platform where AI agents (and humans) submit proposals, review each other’s work, and iterate until a paper meets acceptance criteria. Rather than bolt AI onto the old gears of journals and preprint servers, aiXiv rebuilds the conveyor belt end‑to‑end. Why this matters (and to whom) Research leaders get a way to pressure‑test automated discovery without waiting months for traditional peer review. AI vendors can plug agents into a standardized workflow (through APIs/MCP), capturing telemetry to prove reliability. Publishers face an existential question: if quality control is measurable and agentic, do we still need the old queue? The core idea in one sentence A closed‑loop, multi‑agent review system combines retrieval‑augmented evaluation, structured critique, and re‑submission cycles to raise the floor of AI‑generated proposals/papers and create an auditable trail of improvements. ...

August 24, 2025 · 5 min · Zelina
Cover image

Agents on the Wire: Protocols, Memory, and Guardrails for Real-World Agentic AI

TL;DR Agentic AI is moving from toy demos to systems that must coordinate, persist memory, and interoperate across teams and services. A new survey maps the landscape—frameworks (LangGraph, CrewAI, AutoGen, Semantic Kernel, Agno, Google ADK, MetaGPT), communication protocols (MCP, ACP, A2A, ANP, Agora), and the fault lines that still block production scale. This article distills what’s ready now, what breaks in production, and how to architect for the protocols coming next. ...

August 18, 2025 · 6 min · Zelina
Cover image

Therapy, Explained: How Multi‑Agent LLMs Turn DSM‑5 Screens into Auditable Logic

TL;DR DSM5AgentFlow uses three cooperating LLM agents—Therapist, Client, and Diagnostician—to simulate DSM‑5 Level‑1 screenings and then generate step‑by‑step diagnoses tied to specific DSM criteria. Experiments across four LLMs show a familiar trade‑off: dialogue‑oriented models sounded more natural, while a reasoning‑oriented model scored higher on diagnostic accuracy. For founders and PMs in digital mental health, the win is auditability: every symptom claim can be traced to a quoted utterance and an explicit DSM clause. The catch: results are built on synthetic dialogues, so ecological validity and real‑world safety remain open. ...

August 18, 2025 · 5 min · Zelina
Cover image

RAGulating Compliance: When Triplets Trump Chunks

TL;DR A new multi‑agent pipeline builds an ontology‑light knowledge graph from regulatory text, embeds subject–predicate–object triplets alongside their source snippets in one vector store, and uses triplet‑level retrieval to ground LLM answers. The result: better section retrieval at stricter similarity thresholds, slightly higher answer accuracy, and far stronger navigability across related rules. For compliance teams, the payoff is auditability and explainability baked into the data layer, not just the prompt. ...

August 16, 2025 · 5 min · Zelina
Cover image

Lights, Camera, Agents: How MAViS Reinvents Long-Sequence Video Storytelling

The dream of generating a fully realized, minute-long video from a short text prompt has always run aground on three reefs: disjointed narratives, visual glitches, and characters that morph inexplicably between shots. MAViS (Multi-Agent framework for long-sequence Video Storytelling) takes aim at all three by treating video creation not as a single monolithic AI task, but as a disciplined production pipeline staffed by specialized AI “crew members.” The Problem with One-Shot Generators Single-pass text-to-video systems shine in short clips but crumble under the demands of long-form storytelling. They repeat motions, lose scene continuity, and often rely on users to do the heavy lifting—writing scripts, designing shots, and manually training models for character consistency. This is not just a technical shortcoming; it’s a workflow bottleneck that makes creative scaling impossible. ...

August 13, 2025 · 3 min · Zelina
Cover image

From Chaos to Choreography: The Future of Agent Workflows

In the world of Large Language Model (LLM)-powered automation, agents are no longer experimental curiosities — they’re becoming the operational backbone for scalable, autonomous AI systems. But as the number and complexity of these agents grow, the missing piece is no longer raw capability; it’s choreography. This is where agent workflows come in: structured orchestration frameworks that govern how agents plan, collaborate, and interact with tools, data, and each other. A recent survey of 24 representative systems — from industry platforms like LangChain, AutoGen, and Meta-GPT to research frameworks like ReAct and ReWoo — reveals not just technical diversity, but a strategic gap in interoperability. ...

August 9, 2025 · 3 min · Zelina
Cover image

Meta-Game Theory: What a Pokémon League Taught Us About LLM Strategy

When language models battle, their strategies talk back. In a controlled Pokémon tournament, eight LLMs drafted teams, chose moves, and logged natural‑language rationales every turn. Beyond win–loss records, those explanations exposed how models reason about uncertainty, risk, and resource management—exactly the traits we want in enterprise decision agents. Why Pokémon is a serious benchmark (yes, really) Pokémon delivers the trifecta we rarely get in classic AI games: Structured complexity: 18 interacting types, clear multipliers, and crisp rules. Uncertainty that matters: imperfect information, status effects, and accuracy trade‑offs. Resource management: limited switches, finite HP, role specialization. Crucially, the action space is compact enough for language-first agents to reason step‑by‑step without search trees—so we can see the strategy, not just the score. ...

August 9, 2025 · 4 min · Zelina
Cover image

When AI Plays Lawmaker: Lessons from NomicLaw’s Multi-Agent Debates

When AI Plays Lawmaker: Lessons from NomicLaw’s Multi-Agent Debates Large Language Models are increasingly touted as decision-making aides in policy and governance. But what happens when we let them loose together in a legislative sandbox? NomicLaw — an open-source multi-agent simulation inspired by the self-amending game Nomic — offers a glimpse into how AI agents argue, form alliances, and shape collective rules without human scripts. The Experiment NomicLaw pits LLM agents against legally charged vignettes — from self-driving car collisions to algorithmic discrimination — in a propose → justify → vote loop. Each agent crafts a legal rule, defends it, and votes on a peer’s proposal. Scoring is simple: 10 points for a win, 5 for a tie. Two configurations were tested: ...

August 8, 2025 · 3 min · Zelina
Cover image

From Autocomplete to Autonomy: How LLM Code Agents are Rewriting the SDLC

The idea of software that writes software has long hovered at the edge of science fiction. But with the rise of LLM-based code agents, it’s no longer fiction, and it’s certainly not just autocomplete. A recent survey by Dong et al. provides the most thorough map yet of this new terrain, tracing how code generation agents are shifting from narrow helpers to autonomous systems capable of driving the entire software development lifecycle (SDLC). ...

August 4, 2025 · 4 min · Zelina
Cover image

Mind's Eye for Machines: How SimuRA Teaches AI to Think Before Acting

What if AI agents could imagine their future before taking a step—just like we do? That’s the vision behind SimuRA, a new architecture that pushes LLM-based agents beyond reactive decision-making and into the realm of internal deliberation. Introduced in the paper “SimuRA: Towards General Goal-Oriented Agent via Simulative Reasoning Architecture with LLM-Based World Model”, SimuRA’s key innovation lies in separating what might happen from what should be done. Instead of acting step-by-step based solely on observations, SimuRA-based agents simulate multiple futures using a learned world model and then reason over those hypothetical outcomes to pick the best action. This simple-sounding shift is surprisingly powerful—and may be a missing link in developing truly general AI. ...

August 2, 2025 · 3 min · Zelina