Deep Research

Forget Me Not: How IterResearch Rebuilt Long-Horizon Thinking for AI Agents

Opening — Why this matters now The AI world has become obsessed with “long-horizon” reasoning—the ability for agents to sustain coherent thought over hundreds or even thousands of interactions. Yet most large language model (LLM) agents, despite their size, collapse under their own memory. The context window fills, noise piles up, and coherence suffocates. Alibaba’s IterResearch tackles this problem not by extending memory—but by redesigning it. ...

Deep Queries, Fast Answers: Why ‘Deep Research’ Wants to Be Your New Analytics Runtime

TL;DR Deep Research agents are great at planning over messy data but bad at disciplined execution. Semantic-operator systems are the opposite: they execute efficiently but lack dynamic, cross-file reasoning. The Palimpzest prototype bridges the two with Context, compute/search operators, and materialized context reuse—a credible blueprint for an AI‑native analytics runtime over unstructured data. The Business Problem: Unstructured Data ≠ SQL Most companies still funnel PDFs, emails, HTML, and CSVs into brittle ETL or costly human review. Classic OLAP/SaaS BI stacks excel at structured aggregates, but stumble when a question spans dozens of noisy files (e.g., “What’s the 2024 vs 2001 identity‑theft ratio?”) or requires nuanced judgments (e.g., “Which Enron emails contain firsthand discussion of Raptor?”). Two current approaches each miss: ...

Atom by Atom, Better Research: How Fine-Grained Rewards Make Agentic Search Smarter

If you’ve ever watched a web agent swing from elegant reasoning to face‑plants on basic facts, you’ve met the limits of outcome‑only training. Atom‑Searcher proposes a simple but radical fix: stop treating the whole reasoning trace as one monolith. Instead, break it down into Atomic Thoughts—the minimal, functional units of reasoning—and supervise them directly with a Reasoning Reward Model (RRM). Then blend those process‑level rewards with the final answer score using a decaying curriculum. The result? More stable training, deeper search behavior, and better generalization across in‑ and out‑of‑domain QA. ...