Cover image

Thinking About Thinking: When LLMs Start Writing Their Own Report Cards

Report cards are usually written by teachers, managers, examiners, auditors, or other people with the institutional privilege of saying, “Nice effort, but no.” The paper Reinforcing Chain-of-Thought Reasoning with Self-Evolving Rubrics asks a stranger question: what if the model helps write the report card for its own reasoning process?1 That sounds like the kind of governance idea that would make a compliance officer reach for coffee. A model evaluating itself is not automatically trustworthy. Sometimes it is self-reflection. Sometimes it is theatre with JSON brackets. ...

February 13, 2026 · 18 min · Zelina
Cover image

Grading the Doctor: How Health-SCORE Scales Judgment in Medical AI

Checklist is a boring word. That is why it is useful. In healthcare AI, the glamorous question is whether a model can “reason like a doctor.” The operational question is uglier: did it invent a lab value, miss an emergency referral, overstate certainty, ignore the requested format, recommend unsafe antibiotics, or fail to ask for missing context? ...

February 2, 2026 · 15 min · Zelina