Cover image

The Butterfly Defect: Diagnosing LLM Failures in Tool-Agent Chains

As LLM-powered agents become the backbone of many automation systems, their ability to reliably invoke external tools is now under the spotlight. Despite impressive multi-step reasoning, many such agents crumble in practice—not because they can’t plan, but because they can’t parse. One wrong parameter, one mismatched data type, and the whole chain collapses. A new paper titled “Butterfly Effects in Toolchains” offers the first systematic taxonomy of these failures, exposing how parameter-filling errors propagate through tool-invoking agents. The findings aren’t just technical quirks—they speak to deep flaws in how current LLM systems are evaluated, built, and safeguarded. ...

July 22, 2025 · 3 min · Zelina
Cover image

Proofs and Consequences: How Math Reveals What AI Still Doesn’t Know

What happens when we ask the smartest AI models to do something truly difficult—like solve a real math problem and prove their answer is correct? That’s the question tackled by a group of researchers in their paper “Mathematical Proof as a Litmus Test.” Instead of testing AI with casual tasks like summarizing news or answering trivia, they asked it to write formal mathematical proofs—the kind that leave no room for error. And the results? Surprisingly poor. ...

June 23, 2025 · 4 min · Zelina