From Search to Synthesis: Why AI’s Next Leap Requires Structured Thinking

Opening — Why this matters now

The past year has crowned a new class of AI tools: “Deep Research” agents. They browse, summarize, and produce long-form reports with suspicious confidence. For a while, that was enough.

But cracks are showing.

Ask these systems anything requiring actual data reasoning—market structure shifts, policy impacts, or cross-domain comparisons—and they begin to hallucinate sophistication. The problem isn’t intelligence. It’s foundation.

Most AI research agents are still glorified web readers.

The paper Towards Knowledgeable Deep Research fileciteturn0file0 introduces a more uncomfortable idea: real research requires structured knowledge—tables, numbers, relationships—not just text. And more importantly, it requires systems that can think with data, not just quote it.

Background — Context and prior art

Deep Research (DR) agents emerged as the natural evolution of LLM capabilities:

Capability Layer	Traditional LLM	Deep Research Agent
Information Access	Static knowledge	Web search + retrieval
Reasoning	Single-step	Multi-step workflows
Output	Short answers	Long-form reports

However, as the paper points out, nearly all existing DR systems share a structural limitation:

They rely heavily on unstructured web content and lack meaningful interaction with structured data.

This leads to three predictable failures:

Weak quantitative reasoning — numbers are cited, not analyzed
Shallow conclusions — synthesis without computation
Illusion of rigor — reports look analytical but lack data grounding

In other words, they resemble interns who read everything—but never opened Excel.

Analysis — What the paper actually does

1. Redefining the problem: Knowledgeable Deep Research (KDR)

The authors introduce a stricter formulation:

A research agent must reason over both structured (tables) and unstructured (text) knowledge to generate grounded reports.

Formally, the task expands from:

“Find and summarize information”

To:

“Retrieve, compute, validate, and synthesize across heterogeneous knowledge sources”

This is not incremental. It is architectural.

2. The HKA Framework: Divide and specialize

The proposed system—Hybrid Knowledge Analysis (HKA)—is a multi-agent pipeline with explicit functional separation:

Component	Role	Failure it Fixes
Planner	Decomposes tasks	Avoids chaotic reasoning
Unstructured Analyzer	Handles web/text	Maintains context richness
Structured Analyzer	Handles tables + computation	Enables real analysis
Writer	Synthesizes outputs	Prevents fragmentation

The key innovation is not the multi-agent setup (we’ve seen that before), but the Structured Knowledge Analyzer (SKA).

3. The real breakthrough: treating data as executable

Instead of stuffing tables into prompts (which is inefficient and brittle), the system:

Converts tables into structured objects
Generates Python code dynamically
Executes computations
Uses vision-language models to interpret outputs

This creates a pipeline where:

Data → Code → Execution → Insight → Narrative

Notably, the system includes retry mechanisms that reduce execution failure rates from 31.7% to 0.51%, and visual analysis errors from 55.5% to 1.7% fileciteturn0file0.

That’s not optimization. That’s operational viability.

4. Evaluation: KDR-Bench (where most papers get lazy)

Instead of generic benchmarks, the authors build a domain-diverse dataset:

Metric Category	What It Measures	Why It Matters
General-purpose	Coherence, depth, readability	Surface quality
Knowledge-centric	Use of correct data & conclusions	Actual reasoning
Vision-enhanced	Use of figures and layout	Multimodal intelligence

The dataset includes:

9 domains
41 expert-level questions
1,252 structured tables

This is unusually grounded for an LLM paper—and refreshingly difficult.

Findings — What actually works (and what doesn’t)

1. Structured reasoning is not optional

System Type	Performance Trend
Web-only agents	High readability, low depth
Table-only agents	Higher depth, weaker synthesis
Hybrid (naive)	Marginal improvement
HKA (structured integration)	Strong across all metrics

The key insight:

Simply adding data access is not enough. You need structured reasoning pipelines.

2. HKA vs industry systems

From the benchmark results:

Metric	Best Baseline (Gemini DR)	HKA
General Score	50.2	48.4
Key Point Coverage	58.3	61.7
Supportiveness	~20–21 (others)	27.8

Interpretation:

Gemini still wins on general fluency (unsurprising)
HKA dominates where it matters: data-backed reasoning

A subtle but critical distinction.

3. Vision matters more than expected

When evaluated with multimodal judges:

HKA outperforms Gemini in overall win rate
Generates ~5.75 figures per report vs ~2.0 baseline in some domains

Translation: the ability to show reasoning (not just describe it) is now a competitive edge.

4. Ablation reveals the uncomfortable truth

Removing either:

Structured analyzer → major drop
Unstructured analyzer → also drops

Conclusion:

Real research is inherently hybrid. Any single-mode system is incomplete.

Implications — What this means for AI builders

1. “Search + summarize” is a dead-end product

If your AI tool:

Retrieves documents
Summarizes them
Calls it “research”

…it’s already obsolete.

The next generation must:

Execute computations
Validate outputs
Generate evidence (charts, tables)

2. Code generation becomes a core reasoning layer

This paper quietly confirms a trend:

The most reliable way for LLMs to reason about data is to write and execute code.

Expect future architectures to standardize:

Code-first reasoning
Data schema abstraction
Execution-aware feedback loops

3. Evaluation frameworks will redefine competition

Benchmarks like KDR-Bench expose a gap:

Current systems optimize for textual plausibility
Future systems will optimize for data correctness

This shift will reshape leaderboards—and product claims.

4. Multimodal outputs are not UI polish—they’re cognition

Figures, tables, and layouts are not presentation layers anymore.

They are part of the reasoning process.

That’s a conceptual shift most companies haven’t priced in.

Conclusion — The quiet transition from language to knowledge

The industry has spent two years teaching machines to speak.

Now it has to teach them to think.

This paper makes a simple but disruptive argument:

Intelligence is not the ability to generate text—it’s the ability to reason over structured reality.

And that requires something most LLM systems still avoid:

Discipline.

Not more parameters. Not better prompts.

Just better thinking.

Cognaptus: Automate the Present, Incubate the Future.

Opening — Why this matters now#

Background — Context and prior art#

Analysis — What the paper actually does#

1. Redefining the problem: Knowledgeable Deep Research (KDR)#

2. The HKA Framework: Divide and specialize#

3. The real breakthrough: treating data as executable#

4. Evaluation: KDR-Bench (where most papers get lazy)#

Findings — What actually works (and what doesn’t)#

1. Structured reasoning is not optional#

2. HKA vs industry systems#

3. Vision matters more than expected#

4. Ablation reveals the uncomfortable truth#

Implications — What this means for AI builders#

1. “Search + summarize” is a dead-end product#

2. Code generation becomes a core reasoning layer#

3. Evaluation frameworks will redefine competition#

4. Multimodal outputs are not UI polish—they’re cognition#

Conclusion — The quiet transition from language to knowledge#