TL;DR
Deep Research agents are great at planning over messy data but bad at disciplined execution. Semantic-operator systems are the opposite: they execute efficiently but lack dynamic, cross-file reasoning. The Palimpzest prototype bridges the two with Context, compute/search operators, and materialized context reuse—a credible blueprint for an AI‑native analytics runtime over unstructured data.
The Business Problem: Unstructured Data ≠ SQL
Most companies still funnel PDFs, emails, HTML, and CSVs into brittle ETL or costly human review. Classic OLAP/SaaS BI stacks excel at structured aggregates, but stumble when a question spans dozens of noisy files (e.g., “What’s the 2024 vs 2001 identity‑theft ratio?”) or requires nuanced judgments (e.g., “Which Enron emails contain firsthand discussion of Raptor?”). Two current approaches each miss:
- Deep Research agents write plans and code, but often “shortcut” execution (stop early, regex their way through), sacrificing recall and accuracy.
- Semantic-operator engines (map/filter/join/aggregate with natural‑language specs) execute at scale with cost models, but iterate record‑by‑record and struggle with cross-document logic and interactive latencies.
Net effect: costs balloon, latency spikes, and answers drift from correct to merely plausible.
The Core Idea: Make Agents Speak “Database”—and Databases Think “Agent”
Palimpzest’s prototype introduces three pieces that together behave like a runtime:
-
Context — a richer dataset handle
- Iterator semantics plus pluggable index/top‑k methods (e.g., vector search, key lookup), dataset‑specific tools, and a natural‑language description of what’s inside.
- This lets agents choose how to access data (iterate vs search) and why (guided by the description), rather than blindly streaming every file.
-
compute and search operators — agentic, but optimizer‑aware
- Each operator is physically implemented with a code‑capable agent equipped with a tool that writes an optimized semantic‑operator program on demand.
- The agent can still explore, plan, and run Python, but when heavy lifting is needed, it calls into an optimized, cost‑aware pipeline (model choice, rewrites, physical ops) instead of hand‑rolled loops.
-
Materialized Contexts — reuse the hard‑won work
- Execution traces become new Contexts whose descriptions are embedded and cached. Future queries can retrieve similar Contexts as starting points—akin to materialized views in OLAP, but for AI workflows over unstructured data.
Together, these pieces aim for the sweet spot: agent‑level flexibility with database‑grade optimization and reuse.
Why This Matters (and Where It Helps First)
1) Document intelligence at enterprise scale. Compliance, legal, and ops teams constantly ask “needle‑in‑haystack” questions across evolving corpora. The compute/search operators let you filter exhaustively with LLMs (high recall) while keeping cost/latency sane via optimizer‑picked models and non‑redundant passes.
2) Interactive analytics over data lakes. When an analyst asks a follow‑up (“Actually, compare 2024 vs 2001 and 2010”), the system can reuse Contexts rather than re‑crawl, turning multi‑minute reprocessing into incremental deltas.
3) Governance and repeatability. Descriptive Contexts act like explainable artifacts: what was read, which tools ran, what was cached. That’s audit‑friendly and reduces the “agent did something magical” problem.
A Tale of Two Queries (and What We Learn)
Kramabench—identity‑theft ratio, 2024 vs 2001.
- Naïve semantic programs grind through every file and can still be logically brittle (year disambiguation across multiple files).
- Pure agents often stop early or pick wrong files.
- Palimpzest’s compute operator iterates: explore → write an optimized semantic program → compute the ratio in Python. Outcome: near‑zero error with reasonable cost.
Enron Raptor emails—firsthand discussion only.
- Pure agents: high precision, low recall (manual spot‑checks, regex filters).
- Agent + unoptimized semantic tools: great quality, bloated cost/latency (redundant passes).
- compute: writes one optimized Palimpzest program, maintaining quality while cutting cost/time via the optimizer.
Takeaway: The runtime’s edge is not that agents became smarter—it’s that planning remains agentic while execution becomes optimized and reusable.
Architecture Sketch You Can Map to Your Stack
Layer | What it does | Your likely component today | Upgrade path |
---|---|---|---|
Interface | NL questions, dashboards, notebooks | BI + notebooks | Add agentic query surface (chat/notebook cell) |
Planning | Agent scoping, tool orchestration | Ad‑hoc Python + prompts | Introduce compute/search operators wired to your tools |
Optimized Execution | Cost‑based pipelines over docs | Batch scripts, ETL jobs | Adopt semantic operators with a cost model & model selection |
Data Access | Iteration + indexes + tools | Object store + vector DB | Wrap your lake as Context with embed/top‑k + utility tools |
Reuse | Materialized Contexts (cached, searchable) | Ad‑hoc caches | Index Context descriptions; retrieve for similar new asks |
Adoption Playbook (90 Days)
- Wrap a pilot corpus (e.g., a folder of reports, emails) as a Context with: (a) a crisp description, (b) a vector index, (c) 1–2 helpful tools (e.g., schema sniffer, HTML→text cleaner).
- Stand up semantic operators with optimizer picks for models (cheap vs accurate) and enable query rewrites (split/merge filters; pre‑aggregate when possible).
- Expose compute/search via a notebook cell or chat—start with 2–3 recurring questions. Cache all intermediate Contexts.
- Measure: F1/recall on labeled tasks, $/GB processed, p95 latency. Iterate model choices and rewrite strategies.
Risks & Watch‑outs
- Operator explosion: Too many tiny compute/search steps can thrash the LLM meter. Use rewrites and Context reuse to batch work.
- Stale Contexts: Cached descriptions drift as data updates. Schedule Context invalidation policies.
- Hidden coupling: Dataset‑specific tools (e.g., filename heuristics) can leak assumptions. Keep tools small and document their contracts.
What This Signals for the AI‑Native Data Stack
The pragmatic path isn’t “replace SQL with agents,” but let agents draft plans while a cost‑based engine executes. Over time, enterprises will accumulate a graph of Contexts—living, explainable artifacts that make subsequent questions cheaper, faster, and more accurate. That’s the closest thing we’ve seen to a durable moat for AI analytics on unstructured data.
Cognaptus: Automate the Present, Incubate the Future