Opening — Why this matters now

We are deploying black-box AI systems faster than we are understanding them. Large language models, vision–language agents, and robotic controllers are increasingly asked to do things, not just answer questions. And yet, when these systems fail, the failure is rarely spectacular—it is subtle, conditional, probabilistic, and deeply context-dependent.

The uncomfortable truth is this: most organizations do not actually know what their AI systems are capable of. They know what the system did last week. They know what it usually does. But they do not know what it can do, under which conditions, and with what likelihood of side effects. This gap is no longer academic—it is operational risk.

Background — From actions to capabilities

Most prior work on AI behavior modeling focuses on actions: primitive steps, low-level policies, or token-level decisions. This is useful if you are training the model. It is far less useful if you are using the model.

Users and operators think in terms of capabilities:

  • Can the agent clean a room?
  • Can it navigate safely to a target?
  • Can it stack objects without collapsing them?

Capabilities are not single actions. They are intent-driven sequences of decisions that unfold over time, often stochastically, and often with unintended side effects. Modeling them requires a different lens—one that treats the AI as a planner whose internal logic is opaque.

Analysis — Turning black boxes into probabilistic maps

The paper introduces Probabilistic Capability Model Learning (PCML), a framework that does something deceptively simple: it interrogates a black-box AI system like a skeptical interviewer.

Instead of assuming what the AI can do, PCML:

  1. Discovers capabilities empirically by observing state changes that reliably occur when the AI pursues an intent.

  2. Represents each capability using probabilistic, conditional rules (in a PDDL-style language) that specify:

    • Preconditions
    • Possible outcomes
    • Their probabilities
  3. Actively probes uncertainty by generating targeted test scenarios using Monte Carlo Tree Search (MCTS).

Crucially, PCML maintains two bounding models at all times:

Model Type What it assumes Why it matters
Pessimistic Only what has been observed Safe, conservative deployment
Optimistic Everything not yet ruled out Efficient exploration and learning

When these two models converge, uncertainty collapses. At that point, you do not just hope you understand the system—you provably do.

Findings — What the experiments reveal

Across simulated kitchens, grid worlds, block manipulation tasks, and LLM-driven agents, the results are remarkably consistent:

  • Active probing beats random testing by a wide margin
  • Learned capability models converge rapidly
  • Many agents exhibit surprising blind spots

Some illustrative examples:

Agent Intended Capability What actually happens
MiniGrid (LLM-based) Navigate to target Opens unnecessary doors, picks up irrelevant keys
SayCan (robotic) Stack blocks Succeeds ~6%, often knocks the tower over
Planning agent (LAO*) Place block A Exhibits hidden preference biases

These are not bugs in the usual sense. They are capability contours—regions where the system behaves differently than its designers or users expect.

Implications — Safety, governance, and ROI

This work quietly reframes several debates:

  • AI safety is not just about alignment; it is about capability visibility.
  • Governance requires models that explain what an AI can do, not just how it was trained.
  • Business ROI improves when AI deployment is bounded by known capability envelopes rather than blind trust.

For enterprises deploying agentic AI, PCML-like approaches suggest a future where AI systems ship with capability certificates: structured, probabilistic descriptions of what the system can and cannot reliably achieve.

Conclusion — From mystery to management

Black-box AI will not become transparent anytime soon. But transparency is not the only path to control. By learning what an AI can reliably do, under which conditions, and with what risks, we can replace mystery with management.

In that sense, this work does not tame the black box. It teaches us how to map its teeth.

Cognaptus: Automate the Present, Incubate the Future.