Retrieval Systems

Grounded and Confused: Why RAG Systems Still Fail in the Enterprise If you’ve been following the RAG (retrieval-augmented generation) hype train, you might believe we’ve cracked enterprise search. Salesforce’s new benchmark—HERB (Heterogeneous Enterprise RAG Benchmark)—throws cold water on that optimism. It exposes how even the most powerful agentic RAG systems, armed with top-tier LLMs, crumble when facing the chaotic, multi-format, and noisy reality of business data. Deep Search ≠ Deep Reasoning Most current RAG benchmarks focus on shallow linkages—documents tied together via entity overlap or topic clusters. HERB rejects this toy model. It defines Deep Search as not just multi-hop reasoning, but searching across unstructured and structured formats, like Slack threads, meeting transcripts, GitHub PRs, and internal URLs. It’s what real enterprise users do daily, and it’s messy. ...