Opening — Why this matters now

AI agents are no longer toy demos. They write production code, refactor legacy systems, navigate websites, and increasingly make decisions that matter. Yet one deceptively simple question remains unresolved: can an AI agent reliably tell whether it will succeed?

This paper delivers an uncomfortable answer. Across frontier models and evaluation regimes, agents are systematically overconfident about their own success—often dramatically so. As organizations push toward longer-horizon autonomy, this blind spot becomes not just an academic curiosity, but a deployment risk.

Background — From token confidence to agentic uncertainty

Traditional uncertainty estimation in machine learning focuses on predictions: probabilities over tokens, labels, or answers. But autonomous agents operate on a different plane. Their success depends on long, multi-step trajectories involving planning, tool use, intermediate decisions, and error recovery.

The authors formalize this gap through agentic uncertainty—the probability an agent assigns to its own eventual task success. This extends earlier ideas like P(IK) (“probability that I know”) into a richer setting: P(IS), the probability that I succeed.

Crucially, the same underlying model plays two roles:

  • a task agent that attempts the solution, and
  • an uncertainty agent that estimates success at different stages.

This isolates overconfidence as a property of self-assessment, not model capability.

Analysis — Three moments of self-judgment

The study probes agentic uncertainty at three points in the lifecycle, plus one twist:

Regime Information Available Oversight Question
Pre-execution Task + repo (read-only) Should we attempt this?
Mid-execution Partial trajectory Are we failing already?
Post-execution Finished patch Did we succeed?
Adversarial post-exec Patch + bug-finding prompt What could be wrong?

All evaluations are run on SWE-bench Pro, a demanding benchmark where real success rates remain low (22–35%), even for frontier models.

Findings — Confidence without calibration

1. Pervasive overconfidence

Post-execution agents routinely predict success rates two to three times higher than reality. One model forecasts 77% success on tasks it completes only 22% of the time. Confidence distributions for successful and failed attempts are nearly indistinguishable.

The implication is stark: high confidence conveys almost no information.

2. More context, worse judgment

Counterintuitively, pre-execution estimates discriminate success better than post-execution review, despite having less information. Seeing a plausible-looking solution appears to anchor agents into believing it works.

This undermines a common assumption in AI deployment: that verification is easier than generation.

3. Mid-execution “cold feet” don’t help

As agents progress, their confidence often declines—but this doubt is largely uninformative. Both successful and failing trajectories exhibit similar confidence drops. The agent feels less sure, but not in a way that predicts failure.

4. Adversarial framing actually works

Prompting agents to actively search for bugs instead of asking whether a solution is correct substantially improves calibration. Overconfidence drops by up to 15 percentage points, and in some models, discrimination improves as well.

This reframing shifts the agent from confirmation to falsification—a small prompt change with outsized effects.

Visualization — Calibration beats intuition

Method Discrimination (AUROC) Calibration (ECE) Overconfidence
Pre-execution Higher Moderate High
Post-execution Lower Worst Extreme
Adversarial post-exec Competitive Best Reduced

Notably, simple post-hoc recalibration can fix some models—but for others, adversarial prompting introduces genuinely new signal, not just a downward shift in confidence.

Implications — Designing safer autonomy

Three practical lessons emerge:

  1. Never trust agent self-confidence at face value. High confidence is not evidence of correctness.
  2. Front-load uncertainty checks. Pre-execution assessment is surprisingly valuable for task routing.
  3. Build adversarial review into the loop. Even lightweight bug-hunting prompts materially improve reliability.

For high-stakes systems, a hybrid strategy—pre-execution filtering plus adversarial post-execution review—appears far safer than naive self-verification.

Conclusion — Confidence is cheap, calibration is not

As AI agents take on longer and riskier tasks, their inability to accurately judge their own success becomes a systemic weakness. This paper shows that overconfidence is not a bug in one model, but a structural feature of current agentic systems.

The fix is not blind optimism, nor blind trust in self-critique—but designed skepticism, embedded directly into how agents are evaluated.

Cognaptus: Automate the Present, Incubate the Future.