Mind the Gaps: Why LLMs Reason Like Brilliant Amnesiacs

Opening — Why this matters now

LLMs are dazzling—until they trip over something embarrassingly simple. This paradox isn’t just a meme; it’s a commercial, regulatory, and engineering liability. As enterprises rush toward AI-driven automation, they face a dilemma: models that solve Olympiad problems but stumble on first-grade logic steps are not trustworthy cognitive workers.

A recent paper, Cognitive Foundations for Reasoning and Their Manifestation in LLMs (Kargupta et al., 2025), confronts the issue head‑on. Instead of asking whether models can reason, it asks which kinds of reasoning they can perform and why inconsistencies occur. The result is a taxonomy of 28 cognitive elements and an empirical study of how today’s top models actually use them.

Spoiler: the gaps aren’t random—they’re structural.

Background — Context and prior art

For over a decade, cognitive scientists have cataloged the mental building blocks humans rely on for reasoning: working memory, abstraction, causal inference, meta‑cognition, and so on. Earlier AI research teased parallels but rarely formalized them in a way that mapped cleanly to LLM behavior.

Industry evaluations tend to toggle between superficial benchmarks (math problems, puzzles) and opaque intuition (“it feels smarter”). What’s missing is a systematic bridge connecting human cognition and LLM performance.

The paper attempts exactly that through:

A taxonomy of 28 cognitive elements, grouped across memory, abstraction, causal reasoning, meta‑reasoning, and more.
A behavioral annotation framework to label which cognitive elements appear in an LLM’s thought chain.
A human‑LLM comparison on the same reasoning tasks.

This is the closest thing we have to a “reasoning genome” for LLMs.

Analysis — What the paper does

1. Builds a cognitive taxonomy for LLM reasoning

The authors synthesize decades of cognitive science into a concrete, operational list of 28 elements. These include:

Computational constraints (e.g., working memory limits)
Representation structures (e.g., schemas, analogies)
Meta‑cognition (e.g., monitoring uncertainty, self‑correction)
Causal reasoning (e.g., counterfactuals)
Search strategies (e.g., backward chaining)

Crucially, they treat these not as philosophical abstractions but as observable behaviors in generated reasoning traces.

2. Annotates LLM reasoning with human‑inspired labels

The team prompts LLMs and human participants on identical tasks, then manually codes the reasoning text to identify which cognitive elements are present.

This gives us a structured behavioral profile for models vs. humans.

3. Extracts reasoning “structures” rather than just scores

Instead of measuring correctness alone, they look at how LLMs compose cognitive elements:

Do they chain them coherently?
Do they display cognitive shortcuts or brittle heuristics?
Do they use high‑level abstractions inconsistently?

The findings: models often rely on brittle, surface‑level heuristics—patterned but not principled.

Findings — Results with visualization

The paper’s analyses show clear divergences between human and LLM reasoning.

Distribution of Cognitive Elements

LLMs tend to overuse pattern‑completion and underuse meta‑cognition and causal reasoning.

Cognitive Element Category	Humans (frequency)	LLMs (frequency)
Meta‑cognition	High	Low
Causal reasoning	High	Moderate
Schema/analogy use	Moderate	High
Working memory operations	High	Low

Behavioral Structures

LLMs exhibit “reasoning silhouettes”—structured but hollow compositions. They mimic the shape of human reasoning while missing critical inner operations.

Human‑LLM Comparison

In tasks requiring:

Self‑monitoring → Humans excel; LLMs guess.
Abstraction transfer → LLMs sometimes outperform humans (thanks to broad training), but fail when required to integrate multiple elements coherently.
Causal logic → Humans apply consistent causal models; LLMs oscillate depending on surface cues.

LLMs are competent performers, not consistent reasoners.

Implications — Why this matters for business and automation

1. AI agents need meta‑cognition before autonomy

If a model can’t reliably track what it knows or doesn’t know, delegating critical tasks (compliance, auditing, trading, autonomous decision loops) becomes hazardous.

2. Enterprises should shift from “accuracy” to cognitive coverage

The taxonomy offers a blueprint for evaluating:

Does your model use the right reasoning strategies?
Where are the systematic blind spots?
Which cognitive elements must be scaffolded or engineered externally?

3. Safety, governance, and assurance must target reasoning structures

Regulators and assurance teams need deeper diagnostics than benchmark scores. Failures often stem from cognitive‑structural gaps—especially in meta‑reasoning and causal inference—rather than from lack of correctness.

4. Crafting LLM chains and agents becomes more scientific

This taxonomy could guide:

Prompt engineering
Chain‑of‑thought templates
Agent architecture
Curriculum design

A future LLM agent might explicitly track its active cognitive elements, much like a human tracks mental steps.

Opportunities and Challenges

Opportunity: Build models that reason through structured cognitive element sequencing.
Challenge: Many cognitive elements (like working memory operations) require architectural innovation, not just training data.
Opportunity: Transparent reasoning structures can transform risk assessment in finance, legal tech, and autonomous systems.
Challenge: Benchmarks must evolve to test reasoning composition, not just task success.

Conclusion

This paper doesn’t claim LLMs can’t reason. It argues they reason differently, with structural weak points that matter for safety, reliability, and enterprise deployment.

Understanding these cognitive foundations is the first step toward building systems that don’t just sound smart but consistently think smart.

Cognaptus: Automate the Present, Incubate the Future.

Opening — Why this matters now#

Background — Context and prior art#

Analysis — What the paper does#

1. Builds a cognitive taxonomy for LLM reasoning#

2. Annotates LLM reasoning with human‑inspired labels#

3. Extracts reasoning “structures” rather than just scores#

Findings — Results with visualization#

Distribution of Cognitive Elements#

Behavioral Structures#

Human‑LLM Comparison#

Implications — Why this matters for business and automation#

1. AI agents need meta‑cognition before autonomy#

2. Enterprises should shift from “accuracy” to cognitive coverage#

3. Safety, governance, and assurance must target reasoning structures#

4. Crafting LLM chains and agents becomes more scientific#

Opportunities and Challenges#

Conclusion#