We’ve all met the paper that promises the moon—then hands you a README, a maze of conda environments, and a prayer. Paper2Agent proposes a different contract: don’t read me, run me. By converting a research paper (and its repo) into a Model Context Protocol (MCP) server that any LLM agent can call, it turns methods into tools, figures into reproducible tests, and “future work” into executable prompts.

This isn’t another “Papers with Code” link farm. It’s a pipeline that (1) mines the repo/tutorials, (2) builds a pinned environment, (3) extracts single‑purpose tools with clear I/O, (4) tests them until they match the paper’s outputs, and (5) deploys the lot as a remote MCP server. Hook that server to your favorite coding/chat agent and you get a paper‑specific copilot that can reproduce, explain, and extend the work.

Why this matters (and to whom)

  • Teams who evaluate methods: Repro runs and figure regen go from days to minutes, with fewer “it works on their machine” surprises.
  • Practitioners: You can ask, in plain language, for the paper’s pipeline to run on your data—without spelunking through API hierarchies.
  • Editors and funders: “Agent-availability” could become a compliance bar that’s higher than data/code dumps yet friendlier to non‑experts.

This aligns with our recurring theme at Cognaptus Insights: agentic infrastructure is gradually becoming the new runtime for knowledge. Paper2Agent is a clean instantiation: MCP as a lingua franca; agents as the UX.

What actually ships

Paper2Agent emits an MCP server per paper with three pillars:

Component What it includes Why it’s useful in practice
MCP Tools Executable, single‑purpose functions distilled from tutorials (e.g., score_variant_effect, normalize_data) Encapsulates method steps with arguments, defaults, and file‑based I/O; minimizes prompt “code hallucination”
MCP Resources Manuscript text, repo links, figures, datasets/URLs in a standardized registry Traceability + auto‑download (e.g., via Zenodo/HTTP)
MCP Prompts Encoded workflows (ordered, parameterized steps) End‑to‑end runs without the user hand‑holding the agent

Under the hood, an orchestrator coordinates four sub‑agents—environment manager, tutorial scanner, tool extractor/implementor, and test‑verifier/improver—until the outputs match the paper’s references. The MCP can be hosted on Hugging Face Spaces and plugged into coding agents (they use Claude Code in the paper).

Case studies with teeth

1) AlphaGenome (genomic variant effects). Paper2Agent generated 22 MCP tools and hit 100% accuracy on both tutorial‑based queries and novel prompts (e.g., different variants and tissues). More interestingly, the agent re‑interpreted a GWAS locus and elevated SORT1 as a plausible causal gene versus the paper’s emphasis on CELSR2/PSRC1—then cross‑checked with GTEx eQTLs. That’s not a UI trick; that’s second‑order use: using a paper’s method to revisit its own conclusion, reproducibly.

2) TISSUE (uncertainty‑aware spatial transcriptomics). Six MCP tools deliver prediction intervals, imputation‑aware downstream analysis, and a Q&A mode that explains inputs/outputs on demand. The agent reproduced the human‑run pipeline on mouse somatosensory cortex datasets and can auto‑fetch the paper’s data via a resource registry.

3) Scanpy (single‑cell preprocessing & clustering). Seven MCP tools plus an MCP prompt encode the canonical pipeline—QC → normalization → HVGs → PCA → graph → Leiden → UMAP. Users provide a path (e.g., data.h5ad) and receive matching outputs vs. human analysts on three public 10x datasets.

What’s genuinely new vs. past attempts

  • Beyond executable papers: Jupyter‑backed papers still force readers to wrangle environments. MCP pushes the environment behind an API with pinned deps.
  • Beyond “Papers with Code”: Links are not interfaces. Paper2Agent ships callable, tested interfaces.
  • Anti‑hallucination by design: Tools are locked after tests pass; agents call tools rather than free‑compose code.

Where it will break (and why that’s okay)

  • Brittle or undocumented repos won’t agentify cleanly. That failure is a feature: it surfaces reproducibility debt quantitatively (tools that failed tests don’t ship).
  • Method scope creep: Many ideas evolve across multiple papers; a per‑paper MCP may be the wrong granularity. Fortunately, MCPs compose—expect family‑level agents that unify a lineage of methods.
  • Evaluation at scale: Today’s verification is curated (tutorial + novel queries). The next step is automated “LLM‑as‑judge” scoring and diffing against golden artifacts.

Strategic takeaways for builders and buyers

  • If you publish: Design with agentification in mind. Prefer modular, tutorial‑first repos; make inputs/outputs explicit; ship small, testable functions. Add an Agent Availability section alongside Data/Code Availability.
  • If you adopt: Treat MCP servers as contracts. Ask: What tools exist? What tests were run? What data registry is exposed? Can I pin versions and attest runs?
  • If you invest: There’s a platform play here: a “Registry of Paper Agents” with trust signals (pass/fail suites, environment hashes, usage telemetry) and enterprise onboarding (RBAC, billing, governance).

A mini‑framework you can use tomorrow

When evaluating a new method, try this 5‑point rubric:

  1. Interface: Are core steps exposed as parameterized tools?
  2. Repro: Do tests regenerate paper figures/tables byte‑for‑byte?
  3. Data Access: Are datasets normalized and fetchable programmatically?
  4. Workflow: Is there an encoded end‑to‑end prompt for the common path?
  5. Extendability: Can I bolt this MCP alongside others to run cross‑paper analyses?

Score ≥4? You’re looking at an agent‑ready method that’s worth piloting.


Cognaptus: Automate the Present, Incubate the Future