Thinking Out Loud — Why LLMs Might *Need* Chain‑of‑Thought

Opening — Why this matters now

Chain‑of‑thought (CoT) reasoning has quietly become one of the most consequential features of modern large language models. When models “think step‑by‑step” in natural language, they often solve harder problems, behave more reliably, and — perhaps most importantly — expose their reasoning to human inspection.

But a deeper question lurks beneath this phenomenon: is chain‑of‑thought merely helpful, or fundamentally necessary for certain kinds of reasoning?

A recent study introduces a formal framework called opaque serial depth, offering a precise way to measure how much reasoning a model can perform without revealing intermediate steps. The results suggest something subtle but powerful: for many architectures, complex reasoning cannot remain hidden inside the model. At some point, it must surface as interpretable tokens.

In other words, thinking out loud may not just be good manners for an AI — it might be an architectural requirement.

Background — The limits of silent reasoning

Large language models perform enormous amounts of parallel computation during a forward pass. Attention layers simultaneously process all tokens, and feed‑forward layers transform high‑dimensional representations in bulk.

However, parallel computation is not the same as sequential reasoning.

Many cognitive tasks — planning, long arithmetic chains, multi‑step deduction — require sequential processing where each step depends on the previous one. In computational complexity theory, this dependency is captured by the concept of circuit depth.

Circuit depth measures the longest chain of operations required to compute a function. Even if millions of operations happen in parallel, the computation cannot finish faster than the deepest chain.

This insight translates directly to LLMs:

Type of reasoning	Computational property
Pattern recognition	Parallel computation
Logical deduction	Serial computation
Planning	Deep serial computation

The crucial question becomes: how much serial reasoning can a neural network perform internally before it must externalize intermediate steps?

That is precisely what opaque serial depth attempts to quantify.

Analysis — Defining Opaque Serial Depth

Opaque serial depth measures the maximum amount of reasoning a neural network can perform between interpretable checkpoints.

For language models, interpretable checkpoints are typically:

Input tokens
Output tokens
Chain‑of‑thought tokens

Everything else — activations, residual streams, attention values — remains opaque.

Intuition

If reasoning requires more sequential computation than the model can perform internally, the model must produce additional tokens representing intermediate steps.

Thus:

Hard reasoning task ↓ Requires deep serial computation ↓ Model exceeds internal serial capacity ↓ Chain‑of‑thought tokens appear

In other words, chain‑of‑thought becomes a computational escape valve.

Circuit depth formulation

The paper formalizes this using the minimum depth of a Boolean circuit computing the same function as the neural network.

Mathematically:

$$ Depth(f_\theta) = \min_{C \in \text{poly}(S)} \max_{P \in C} Length(P) $$

Where:

$f_\theta$ is the neural network
$C$ is a circuit computing the same function
$P$ is a path through the circuit

Opaque serial depth then measures the maximum depth between interpretable nodes.

Findings — How deep are modern models?

Applying the framework to real architectures produces surprisingly concrete numbers.

Serial depth estimates for Gemma 3 models

Model	Serial depth formula	Approx depth (max context)
Gemma 3 1B	4370 + 8 log₂T	4,490
Gemma 3 4B	6036 + 10 log₂T	6,206
Gemma 3 12B	8482 + 16 log₂T	8,754
Gemma 3 27B	11322 + 20 log₂T	11,662

The interesting result is not the magnitude itself, but the scaling behavior.

For Transformers, opaque serial depth grows roughly as:

$$ O(L (\log T + \log D)) $$

Where:

$L$ = number of layers
$T$ = sequence length
$D$ = hidden dimension

This means internal reasoning grows slowly — logarithmically — with sequence length.

But chain‑of‑thought allows reasoning to scale linearly with tokens, effectively extending the model’s reasoning capacity far beyond its internal limits.

Architecture matters

Different architectures dramatically change opaque serial depth:

Architecture	Opaque serial depth
Transformer	O(L(logT + logD))
RNN	O((L + T) logD)
Continuous latent CoT	O(LT(logT + logD))
Persistent black‑box memory	Unbounded

This highlights a major design trade‑off.

Architectures that increase internal reasoning capacity may simultaneously reduce transparency.

Implications — Transparency vs capability

The concept of opaque serial depth has immediate implications for AI governance and system design.

1. Chain‑of‑thought as a safety feature

If difficult reasoning must pass through interpretable tokens, monitoring those tokens becomes a powerful safety mechanism.

This explains why many alignment proposals emphasize chain‑of‑thought monitoring.

2. Architectural transparency trade‑offs

Some emerging architectures threaten this transparency:

Architecture change	Risk
Continuous latent reasoning	Hidden reasoning steps
Persistent memory vectors	Unlimited opaque computation
Recurrent latent loops	Reduced interpretability

Ironically, improving reasoning capability may simultaneously weaken interpretability.

3. A measurable governance metric

Opaque serial depth provides a quantitative metric for evaluating AI architectures:

Goal	Desired property
Capability	High serial depth
Transparency	Low opaque serial depth

Architectures that balance these properties may become the preferred design for safe AI systems.

Conclusion — When AI must think aloud

The key insight of this research is deceptively simple.

Language models cannot hide arbitrarily deep reasoning inside their weights. At some point, if the reasoning becomes complex enough, the architecture forces the model to externalize its thoughts.

Chain‑of‑thought is therefore not merely a training trick or prompt engineering artifact. It may represent a structural property of how Transformers reason.

And that has profound consequences.

If we design AI systems carefully, we may be able to build machines that reason powerfully while still explaining themselves along the way.

A rare case where transparency is not a constraint on intelligence — but a side effect of it.

Cognaptus: Automate the Present, Incubate the Future.

Opening — Why this matters now#

Background — The limits of silent reasoning#

Analysis — Defining Opaque Serial Depth#

Intuition#

Hard reasoning task ↓ Requires deep serial computation ↓ Model exceeds internal serial capacity ↓ Chain‑of‑thought tokens appear

Circuit depth formulation#

Findings — How deep are modern models?#

Serial depth estimates for Gemma 3 models#

Architecture matters#

Implications — Transparency vs capability#

1. Chain‑of‑thought as a safety feature#

2. Architectural transparency trade‑offs#

3. A measurable governance metric#

Conclusion — When AI must think aloud#