Lean | Cognaptus

Proofs at Scale: When 30,000 Agents Replace the Referee

Mathematics has a management problem. That sounds less romantic than saying it has a reasoning problem, but romance is not usually where bottlenecks hide. A proof can be brilliant, a referee can be diligent, and still the verification system can fail for the boring reason that nobody has enough time to check everything line by line. The paper Automatic Textbook Formalization takes that bottleneck seriously and then does something unusually concrete: it reports a multi-agent system that formalized a 500-plus-page graduate algebraic combinatorics textbook into Lean, with all 340 target definitions and theorems proved, in about one week.1 ...

When Less Proves More: The Case for Minimalist AI Theorem Provers

When Less Proves More: The Case for Minimalist AI Theorem Provers Proof is a good place to test AI humility. In ordinary business writing, a model can sound confident, cite familiar patterns, and still be quietly wrong. The error may not surface until the contract is signed, the policy memo is circulated, or the spreadsheet has already acquired the authority of a sacred object. In formal theorem proving, the arrangement is less polite. The model writes code. Lean compiles it. The compiler either accepts the proof or sends it back covered in red ink. ...

Skeletons in the Proof Closet: When Lean Provers Need Hints, Not More Compute

Compute is a very convenient alibi. When an AI system fails, the modern reflex is to ask for more of it: more samples, more tokens, more search, more GPUs, more patience from whoever is paying the invoice. This habit is not always wrong. Sometimes the model really does need another attempt. Sometimes the winning answer is hiding in sample number 47. ...

When Coders Prove Theorems: Agents, Lean, and the Quiet Death of the Specialist Prover

A coder does not trust a program because it sounds plausible. A coder runs it, reads the error message, changes the implementation, tests again, searches the library, asks a colleague, splits the problem, and keeps going until the machine stops complaining. That mundane loop is the interesting part of Numina-Lean-Agent: An Open and General Agentic Reasoning System for Formal Mathematics.1 The headline result is easy to market: with Claude Opus 4.5 as the base model, Numina-Lean-Agent solves all 12 Putnam 2025 problems in Lean, matching the reported perfect score of AxiomProver. Nice. The trophy cabinet sparkles. ...