Half-Life Crisis: Why AI Agents Fade with Time (and What It Means for Automation)
“The longer the task, the harder they fall.”
In the world of automation, we often focus on how capable AI agents are — but rarely on how long they can sustain that capability. A new paper by Toby Ord, drawing from the empirical work of Kwa et al. (2025), introduces a profound insight: AI agents have a “half-life” — a predictable drop-off in success as task duration increases. Like radioactive decay, it follows an exponential curve.
This article explores what that means for business automation, agent design, and the future of dependable AI.
The Trend: Doubling Duration, Halving Risk?
Kwa et al. introduced a benchmark of 170 software engineering, ML, and reasoning tasks designed to simulate AI helping with research. They observed a striking pattern: the duration of tasks AI agents can handle with 50% success is doubling roughly every 7 months.
But here’s the kicker: if you require a higher success rate — say, 80% or 99% — the task duration at which agents can perform plummets. For Claude 3.7 Sonnet:
- 50% success = 59-minute tasks
- 80% success = 15-minute tasks
This exponential decay pattern suggests a constant hazard rate: for each additional minute (as measured in human-equivalent effort), the chance of success drops at a fixed proportion.
The Model: Constant Hazard Rate and Exponential Decline
Ord argues that agent failure behaves like radioactive decay. In survival analysis terms:
- S(t) = probability of surviving (succeeding) up to time t
- Constant hazard → S(t) = exp(-λt)
- The time T where S(T) = 0.5 is the agent’s half-life
Why it matters:
- T90 ≈ T50 / 7, T99 ≈ T50 / 70: exponential loss of reliability
- A 2-day task with 99% success rate may take 6+ years of AI progress from a 50%-1-day benchmark
- The longer the chain of subtasks, the higher the chance of error accumulation — and agents currently lack robust error recovery
Cognaptus Insights: Practical Implications
-
Automation Design
- For firms deploying AI agents, breaking long workflows into modular, retry-capable subtasks can greatly enhance reliability.
- Design agentic workflows assuming task-time constraints based on current half-life estimates.
-
Benchmarks and Evaluation
- Rather than static benchmarks, use duration-based survival curves to evaluate agent robustness.
- Set business-relevant thresholds: Is 80% success in 15 minutes useful? What’s your minimum viable reliability?
-
Forecasting AI Scaling
- This model allows for predictive planning: what success rate can be expected for a given task length next year?
- For example: if a critical report task takes 2 hours, and current agents only manage 50% at 1 hour, do not automate yet — unless you redesign the task or accept lower success rates.
-
Human-AI Differentiation
- Interestingly, humans outperform the exponential decay curve — possibly due to better error correction or cognitive resilience.
- This highlights a key capability gap: long-form persistence is still an open frontier for AI.
Toward Better Agents: What Comes Next?
This exponential framing isn’t just a neat metaphor — it’s a tool for:
- Designing robust agent workflows
- Setting expectations on task reliability
- Planning for capability scaling in AI roadmaps
Going forward, it’s worth testing whether agents can beat the exponential decay by incorporating memory, recovery, and collaboration. Just as radioactive half-life can be extended with shielding or containment, AI half-life may improve with intelligent architectural interventions.
Let’s call it: agentic radiation therapy.
Cognaptus: Automate the Present, Incubate the Future