Cover image

Judge, Jury, and Calibration: Why AI Evaluation Needs Anchors

TL;DR for operators AI is becoming very good at producing judgement-shaped output. That is not the same thing as judgement. Two recent papers make the same operational point from different sides: one shows how AI can estimate educational item difficulty before response data are available; the other shows how LLM-generated peer reviews can look serious while diverging from human reviewing behaviour.12 ...

June 15, 2026 · 14 min · Zelina
Cover image

Pre-Review, Not Peer Review: The Drafting Gate AI Actually Earns

TL;DR for operators AI-Paper-Review is useful because it behaves like a disciplined pre-submission review room, not because it makes peer reviewers obsolete. The system selects a panel of AI reviewer personas, makes them review independently, clusters duplicated concerns, ranks the resulting issues by consensus and severity, then compares them with human reviews. That mechanism matters more than the slogan, because raw AI critique is cheap, noisy, and very good at sounding busy. ...

June 15, 2026 · 18 min · Zelina
Cover image

Peer Pressure: AI Reviewers Pass the Item Test, Not the Replacement Test

Review is a strange business process. The visible output is a verdict: accept, reject, revise, approve, block, escalate. The useful output is usually smaller and more annoying: one specific criticism that is correct, important, and supported by evidence. That distinction is where the new paper On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists becomes more interesting than the usual “can AI replace reviewers?” theatre.1 The paper does not ask whether an AI reviewer can imitate a human reviewer’s overall score. It asks whether each individual criticism is any good. ...

June 3, 2026 · 17 min · Zelina
Cover image

Rebuttal Agents, Not Rebuttal Text: Why ‘Verify‑Then‑Write’ Is the Only Scalable Future

Rebuttal is where polite language goes to be cross-examined. A reviewer asks why the baseline is missing. Another says the theory is unclear. A third implies that the claimed novelty is, shall we say, generously interpreted. The authors have a few days to respond, and every sentence must do three jobs at once: answer the concern, avoid overclaiming, and preserve the paper’s strategic position. ...

January 21, 2026 · 16 min · Zelina
Cover image

When AI Becomes the Reviewer: Pairwise Judgment at Scale

A committee has one expensive problem before it has any philosophical problem: too many proposals, too little time, and no clean way to know whether Proposal 17 was actually better than Proposal 42. So the usual system does what institutions often do when the task is too large to compare directly. It fragments the work. A few reviewers score a few proposals. Their scores are averaged. A ranked list appears. Everyone pretends the number is more stable than the process that produced it. ...

December 12, 2025 · 16 min · Zelina
Cover image

Error 404: Peer Review Not Found — How LLMs Are Quietly Rewriting Scientific Quality Control

Deadline. That is the simplest way to understand why modern AI papers contain mistakes. Not because researchers suddenly forgot algebra. Not because reviewers are lazy. Not because the field has collectively decided that proofs are decorative furniture. The more boring explanation is also the more important one: the AI publication machine has scaled faster than the quality-control machinery around it. ...

December 8, 2025 · 20 min · Zelina
Cover image

Peer Review in the Age of Agents: When Scientists Go Silicon

Reviewers are the unglamorous load-bearing wall of science. They slow things down, miss things, disagree with each other, and occasionally write comments that make authors reconsider their life choices. They are also the reason published knowledge is not just a PDF-shaped rumour. So when a conference lets AI agents act as both primary authors and reviewers, the tempting story writes itself: silicon scientists have entered the building, peer review is next, and human academics can finally retire into committee work, where they have been spiritually living for years. ...

November 21, 2025 · 16 min · Zelina
Cover image

Peer Review Meets Power Tools: How AI Is Quietly Rewriting Scientific Workflows

Peer Review Meets Power Tools: How AI Is Quietly Rewriting Scientific Workflows Research begins with a familiar nuisance: too many papers, too little time, and a creeping suspicion that the most relevant idea is hiding three fields away under someone else’s terminology. Then comes the second nuisance: even after finding the idea, someone must turn it into a hypothesis, a collaborator list, an experiment plan, a protocol, a result, a reviewable claim, and eventually a publishable manuscript. ...

November 14, 2025 · 20 min · Zelina
Cover image

Peer Review, But Make It Multi‑Agent: Inside aiXiv’s Bid to Publish AI Scientists

TL;DR for operators aiXiv is not mainly a claim that AI scientists are ready to flood the world with publishable research and we should all politely applaud the machines. It is more interesting than that, and less comforting. The paper proposes an infrastructure layer for AI-generated science: structured submission, automated review, retrieval-grounded feedback, revision loops, pairwise comparison, prompt-injection detection, multi-model voting, provisional acceptance, DOI-style publication, APIs, MCP interfaces, and public discussion.1 ...

August 24, 2025 · 17 min · Zelina