Opening — Why this matters now

AI safety has quietly shifted from a performance problem to a guarantee problem. It’s no longer enough that systems work most of the time; in safety-critical domains, they must work correctly every time.

Naturally, the industry response has been to scale verification: more rules, more constraints, more formal checks. If something slips through, the instinct is simple—expand the verifier.

The paper fileciteturn0file0 dismantles that instinct.

It argues something far more uncomfortable: even with infinite compute and perfect engineering, complete verification of AI safety is impossible in principle. Not hard. Not expensive. Impossible.

And the reason has nothing to do with neural networks or adversarial inputs. It comes from something much older—and much colder: information theory.


Background — From Gödel to AI Systems

The intellectual lineage here is not machine learning—it’s mathematical logic.

  • Gödel (1931): Any sufficiently powerful formal system is incomplete.
  • Chaitin (1974): This incompleteness can be framed in terms of information content.
  • Kolmogorov Complexity: The complexity of an object is the length of its shortest description.

The paper brings these ideas into AI safety by reframing verification as a formal proof problem.

Instead of asking:

“Does the system behave safely?”

It asks:

“Can a formal verifier prove that this specific behavior is safe?”

This distinction—truth vs. provability—is where the entire argument unfolds.


Analysis — What the paper actually proves

1. Recasting AI behavior as information

Each AI interaction is encoded as a binary object:

  • Input: (z)
  • Output: (y)
  • Policy: (\Pi)

Combined into:

[ x = \langle z, y, \Pi \rangle ]

Verification becomes checking whether a predicate holds:

[ P(x) = 1 \quad \text{(policy is satisfied)} ]

This is a clean abstraction. It removes all architectural details and reduces AI safety to a property of strings.

Which is exactly where Kolmogorov complexity operates.


2. The hidden assumption: safety is not rare

The paper introduces a “richness” assumption:

There are many valid (policy-compliant) behaviors—not just a few structured ones.

This is realistic. In autonomous driving, for example, there are countless safe trajectories for the same situation.

Implication:

  • Among all valid behaviors, some must be highly complex (i.e., not compressible into short descriptions).

3. The key result: verification breaks at high complexity

Here’s the core theorem, stripped of formalism:

For any fixed verifier, there exists a complexity threshold beyond which true safe behaviors cannot be proven safe.

Not because they are unsafe. Not because the verifier is buggy.

But because:

  • The behavior contains more information than the verifier can encode in its proof system.

This is the same structural limit that prevents formal systems from proving certain truths about numbers.

Now applied to AI safety.


4. Why this is not a scaling problem

It’s tempting to interpret this as a “we just need better tools” issue.

That interpretation is wrong.

The limitation is:

  • Independent of compute
  • Independent of model architecture
  • Independent of dataset size

Even an ideal verifier—perfectly implemented—fails beyond a certain information threshold.

This is not engineering friction. This is a ceiling.


Findings — What breaks, and what survives

Verification paradigms under the lens

Approach Core Idea Strength Fundamental Limit
Rule-based / detection Enumerate valid behaviors Interpretable Cannot cover high-complexity cases
Formal verification Prove compliance globally Strong guarantees Incomplete beyond complexity threshold
Statistical validation Test many cases Scalable No guarantees
Proof-carrying systems Verify per-instance proofs Precise, local guarantees Requires proof generation

The real split: global vs. instance-level

The paper exposes a critical distinction:

Strategy Philosophy Outcome
Global verification “Define all valid behaviors” Fundamentally impossible
Instance-level verification “Prove this behavior is valid” Feasible

This is not a design preference.

It is a forced choice.


Implications — What to build instead

1. Proof-carrying AI is not optional—it’s inevitable

If a verifier cannot certify all valid behaviors, the burden shifts:

Each instance must carry its own evidence of correctness.

This leads directly to architectures like:

  • Proof-carrying outputs
  • Verifiable computation (e.g., zk-SNARK-style systems)
  • Constraint-satisfying structured outputs

The system doesn’t say:

“Trust me, I’m safe.”

It says:

“Here is the proof.”


2. Safety becomes a protocol, not a property

Traditional view:

  • Safety is a property of the model

Emerging view:

  • Safety is a protocol between generator and verifier

This has profound system design implications:

  • AI outputs must be structured for verification
  • Verification becomes lightweight checking, not deep reasoning
  • Interfaces between components matter as much as models themselves

3. Complexity becomes a governance concern

This is the subtle but important business implication.

If verification fails beyond a complexity threshold, then:

  • Highly complex behaviors are inherently harder to certify
  • Regulation cannot rely on “complete verification” as a standard

Instead, governance will likely evolve toward:

  • Acceptable proof formats
  • Certification pipelines
  • Auditability of instance-level evidence

In other words, compliance becomes procedural, not absolute.


4. The uncomfortable conclusion

There will always exist:

  • Safe behaviors we cannot certify
  • And certifiable behaviors that are only a subset of what is actually safe

This gap is not a bug.

It is structural.


Conclusion — The shift no one can avoid

The paper’s contribution is not a new verification algorithm.

It’s a reframing of the problem itself.

You cannot build a verifier that proves all safe behaviors.

So the industry must stop trying.

Instead, the path forward is clear—if slightly inconvenient:

  • Move from global guarantees to local proofs
  • Design systems where outputs are inherently verifiable
  • Treat safety as an interaction, not a static property

It’s less elegant than universal correctness.

But unlike universal correctness, it actually exists.


Cognaptus: Automate the Present, Incubate the Future.