Cover image

ResMAS: When Multi‑Agent Systems Stop Falling Apart

A mechanism-first reading of ResMAS, showing why resilient LLM agent systems depend on communication topology and topology-aware prompts, not just more agents.

January 11, 2026 · 15 min · Zelina
Cover image

Stuck on Repeat: When Reinforcement Learning Fails to Notice the Rules Changed

TAPE shows why reinforcement learning agents can fail when the interface stays familiar but the hidden rules of the world change.

January 11, 2026 · 17 min · Zelina
Cover image

Vibe Coding a Theorem Prover: When LLMs Prove (and Break) Themselves

Why Isabellm’s real lesson is not autonomous AI reasoning, but verifier-gated system design for domains where being plausibly right is still wrong.

January 11, 2026 · 14 min · Zelina
Cover image

When LLMs Stop Talking and Start Driving

A mechanism-first reading of how LLM semantic understanding, knowledge graphs, and reinforcement learning can turn enterprise text into operational decisions.

January 11, 2026 · 18 min · Zelina
Cover image

When Solvers Guess Smarter: Teaching SMT to Think in Functions

AquaForte shows how LLMs can guide quantified SMT solving by proposing mathematical function instantiations while traditional solvers keep the formal guarantees.

January 11, 2026 · 15 min · Zelina
Cover image

Judging the Judges: When AI Evaluation Becomes a Fingerprint

A paper on evaluative fingerprints shows why LLM judges are not interchangeable scoring machines but stable measurement devices with their own theories of quality.

January 10, 2026 · 19 min · Zelina
Cover image

NPCs With Short-Term Memory Loss: Benchmarking Agents That Actually Live in the World

A mechanism-first reading of MineNPC-Task, a Minecraft benchmark that shows how memory-aware agents should be tested before anyone trusts them in real workflows.

January 10, 2026 · 17 min · Zelina
Cover image

Distilling the Thought, Watermarking the Answer: When Reasoning Models Finally Get Traceable

ReasonMark shows why watermarking reasoning models may depend less on stronger token bias and more on putting the watermark in the right phase of generation.

January 9, 2026 · 15 min · Zelina
Cover image

From Tokens to Topology: Teaching LLMs to Think in Simulink

A mechanism-first reading of SimuAgent, a Simulink modeling assistant that shows why representation, validation, curriculum, and reflection matter more than merely attaching a larger model to an engineering tool.

January 9, 2026 · 17 min · Zelina
Cover image

Model Cannibalism: When LLMs Learn From Their Own Echo

A mechanism-first reading of how self-generated training data and user feedback can turn ordinary LLM fine-tuning pipelines into bias amplifiers.

January 9, 2026 · 19 min · Zelina