When was the last time a machine truly surprised you—not with a quirky ChatGPT poem or a clever image generation, but with scientific reasoning that evolved on its own? Meet STELLA, an AI agent for biomedical research that doesn’t just solve problems—it gets better at solving them while solving them.

The Static Curse of Smart Agents

Modern AI agents have shown promise in navigating the labyrinth of biomedical research, where each inquiry might require cross-referencing papers, running custom bioinformatics analyses, or interrogating molecular databases. But the vast majority of these agents suffer from a fatal limitation: they rely on static, pre-installed toolkits and hard-coded logic trees. Like a PhD student who memorized a textbook but never updated it, they can’t adapt to new tasks or new knowledge without human intervention.

STELLA’s Self-Evolving Architecture

STELLA (Self-Evolving LLM Agent) takes a fundamentally different approach. Its core innovation lies in a multi-agent system with roles that mirror a research lab:

Agent Role
Manager Agent Decomposes research prompts and curates evolving reasoning templates
Dev Agent Generates and runs code, builds environments, writes analysis reports
Critic Agent Evaluates intermediate results and identifies conceptual gaps
Tool Creation Agent Searches, builds, tests, and integrates new tools on the fly

This isn’t just workflow automation—it’s reasoning evolution. The Manager selects or creates a problem-solving template. The Dev Agent executes it. The Critic suggests refinements. If a missing capability is detected, the Tool Creation Agent writes its own script or fetches the right model to fill the gap. The process is logged, refined, and—critically—remembered for future problems.

Two Evolving Pillars: Templates and Tools

STELLA’s self-evolving nature comes from two central components:

  • Template Library: Instead of following rigid logic chains, STELLA builds a library of successful reasoning templates. If it solves a difficult case—say, identifying a key transcription factor in drug resistance—that workflow becomes a new playbook. Over time, it amasses not just knowledge, but strategy.

  • Tool Ocean: Unlike traditional agents that rely on a fixed toolbox, STELLA’s Tool Creation Agent can:

    • Search GitHub or PubMed
    • Identify a missing function (e.g., cell state perturbation)
    • Build or integrate a new model (e.g., AlphaFold3, scGPT)
    • Add it to its growing Tool Ocean

This means STELLA is not limited to the tools it started with. It’s like a scientist who learns how to make their own instruments.

A Real Example: Cracking Chemotherapy Resistance

In one test case, STELLA was asked to analyze chemotherapy resistance in tumor cells:

  1. Preprocessed scRNA-seq data from pre- and post-treatment samples
  2. Identified gene pathways associated with resistance
  3. The Critic Agent noted: “The analysis is correct but not actionable.”
  4. So, the Tool Creation Agent built a virtual perturbation model to simulate what changes might re-sensitize the tumor
  5. It pinpointed MTF1, a master transcription factor, as the keystone of the resistance network

This wasn’t a scripted sequence. It was generated and evolved dynamically. That’s the difference.

Benchmark Gains That Compound Over Time

Performance-wise, STELLA isn’t just another agent that ekes out a few percentage points. It redefines what benchmarking can look like:

Benchmark Initial Accuracy Accuracy After Evolution
Humanity’s Last Exam (Biomedicine) 14% 26%
LAB-Bench DBQA 45% 54%
LAB-Bench LitQA 52% 63%

The more trials it runs, the better it gets. In practical terms: STELLA learns how to learn.

Why This Matters for the Future of Scientific AI

What sets STELLA apart isn’t just its architecture—it’s the paradigm shift it represents:

  • From predefined tasks to emergent capabilities
  • From one-shot prompting to iterative feedback and tool construction
  • From static pipelines to autonomous research agents

In a field where new diseases, new data, and new techniques appear constantly, tools like STELLA could make human-AI collaboration exponentially more productive. It could mean that tomorrow’s AI doesn’t just cite the latest study—it understands it, adapts to it, and uses it to discover the next one.


Cognaptus: Automate the Present, Incubate the Future