Opening — Why this matters now

Enterprise AI has entered its most awkward phase: impressive demos, disappointing deployments.

The industry is discovering—quietly, and expensively—that building an agent that can act is not the same as building one that should act. The difference is not philosophical. It is statistical, operational, and ultimately financial.

The paper “The Stochastic Gap” formalizes this discomfort. It reframes agentic AI not as a prompt-engineering problem, but as a trajectory reliability problem under uncertainty. In other words, your agent isn’t failing because it picked a wrong answer—it’s failing because it walked down a path your business has never statistically justified.

That is a far more uncomfortable diagnosis.

Background — From deterministic workflows to stochastic agents

Traditional enterprise systems—ERP, BPM, rule engines—are designed to behave almost deterministically. Every step is governed by approvals, constraints, and validation rules.

Then we insert an LLM-based agent.

Suddenly:

  • Actions are sampled from probability distributions
  • Workflows become stochastic trajectories
  • Rare branches and loops accumulate uncertainty

The paper calls this mismatch the stochastic gap.

System Type Decision Logic Risk Profile
Rule-based ERP Deterministic Predictable, auditable
LLM Agent Probabilistic policy Path-dependent, compounding uncertainty

The key shift is subtle but critical:

Reliability is no longer about a single correct step—it is about the entire path remaining statistically supported.

Analysis — The Markov audit nobody is doing

The authors propose a Markov framework built directly from enterprise event logs. Instead of asking “Can the agent do this?”, they ask:

“Has this sequence of decisions ever been sufficiently observed before?”

1. The hidden variable: Blind-spot mass

Two core metrics emerge:

  • State blind-spot mass: how much of the workflow occurs in rarely seen states
  • State-action blind mass: how often decisions themselves lack historical support

Here’s the uncomfortable result from the BPI 2019 dataset:

Metric Value (τ = 1000)
State blind mass 4.6%
State-action blind mass 12.5%

The implication is brutal:

Your process may look well-covered—until the moment a decision is required.

And agents, unfortunately, specialize in making decisions.

2. Entropy: Where autonomy quietly collapses

Even in well-supported states, the next action may be ambiguous.

The paper measures this using Shannon entropy. High entropy = many plausible next steps.

From the empirical findings:

  • Approval changes and exception handling exhibit the highest entropy (~3 bits)
  • These are precisely the steps businesses least want automated blindly

Now the interesting part.

Entropy Threshold Autonomous Events Fully Autonomous Cases
h₀ = 2.0 72.2% 49.6%
h₀ = 1.5 53.3% 7.1%

This is the “illusion of autonomy.”

At the step level, everything looks fine. At the workflow level, autonomy collapses.

3. The gate: Where humans re-enter the system

The framework defines a simple but powerful control mechanism:

Escalate if:

  • Low support (rare state)
  • High entropy (ambiguous decision)
  • High risk (economic or exception-sensitive)

This creates a Human-in-the-Loop (HITL) gate.

And here’s the twist:

The same gate that improves reliability also defines your cost structure.

4. Reliability and cost are the same equation

The paper derives an elegant identity:

Component Effect
More autonomy Lower human cost, higher error risk
More escalation Higher cost, lower error risk

This produces a reliability-cost frontier:

Entropy Threshold Safe Completion Human Touches per Case
1.5 55.6% 3.02
2.0 49.6% 2.26
2.25 45.2% 1.90

There is no free lunch. Only trade-offs.

Findings — What actually predicts agent performance

The paper validates its framework with a simulated agent trained on 80% of the data and tested on 20%.

Two findings matter:

1. Simple probabilities beat intuition

The metric:

  • m(s) = max π(a|s) (top predicted action probability)

Tracks real-world accuracy within 3.4 percentage points.

Translation:

You don’t need a better model. You need to respect the probability distribution you already have.

2. Path dependence dominates outcomes

Even when individual steps are accurate, small uncertainties compound.

Metric Value at h₀ = 2.0
Cases staying in autonomous zone 42.3%
Fully correct zero-touch completion 16.1%

That gap is the cost of stochasticity.

Implications — What businesses should actually do

The paper quietly dismantles a common enterprise instinct:

“Let’s deploy first, then monitor.”

No.

You already have the data to audit your agent—before it exists.

1. Pre-deployment audits should become standard

Before building an agent:

  • Measure state-action support
  • Identify high-entropy decision points
  • Quantify blind-spot mass

This tells you where autonomy is justified.

2. Most workflows are partially automatable—not fully

The data suggests a layered architecture:

Zone Strategy
High support, low entropy Full autonomy
Medium uncertainty Assisted automation
High entropy / high risk Human control

In practice, this resembles a selective autonomy system, not a fully autonomous agent.

3. ROI is constrained by oversight, not model quality

This is the paper’s most commercially relevant insight.

If your agent requires frequent escalation:

  • Human cost dominates
  • ROI collapses

Which explains why so many AI projects fail—not technically, but economically.

4. More context can make things worse

Refining the state (adding value, actor, context):

  • Improves realism
  • Expands state space
  • Increases blind-spot mass

In short:

The more “intelligent” your system becomes, the less statistically supported it may be.

A slightly ironic outcome.

Conclusion — The uncomfortable truth about agentic AI

The paper does not argue that agents are unreliable.

It argues something more precise—and more inconvenient:

Most enterprise workflows were never statistically designed to support autonomy in the first place.

The implication is clear.

Agent deployment is not a modeling problem. It is a data support problem disguised as intelligence.

And until businesses start auditing that support, they will continue to deploy agents that look competent in isolation—and fail spectacularly in sequence.

Subtle, predictable, and entirely avoidable.


Cognaptus: Automate the Present, Incubate the Future.