Knowledge Graphs

Ground Control to Synthetic Data: Why Enterprise LLMs Need a Source of Truth

TL;DR for operators Synthetic data is having its predictable enterprise moment: everyone wants more of it, faster, cheaper, and preferably without involving humans who ask inconvenient questions like “is this correct?” The two papers here are useful because they push against that lazy version of the story. StateGen, from PayPal AI, focuses on generating multi-turn training conversations for tool-augmented LLM agents, using an authoritative world-state object, tool simulation, persona variation, and multi-axis judging.1 CYQUARK focuses on generating Text-To-Cypher fine-tuning data from a target property graph and schema, expanding query expressivity while filtering natural-language paraphrases for logical fidelity.2 ...

Flush Before You Trust: The Locality Trick Behind Incremental Sheaf Cohomology

TL;DR for operators Most business systems do not fail because they lack another dashboard. They fail because the dashboard is reading from a structure that changed three minutes ago, and nobody knows which part of the structure is now stale. Delightful. The paper behind this article proposes an incremental algorithm for maintaining first sheaf cohomology, $H^1$, on evolving 1-dimensional cellular complexes — essentially graph-like structures decorated with local vector spaces and consistency maps.1 In plainer operational language, it is about tracking whether a changing network of constraints still holds together without rebuilding the whole mathematical object after every edit. ...

Graph Work, Not Graph Worship: RAGA Turns RAG Into an Auditable Knowledge Operation

TL;DR for operators RAGA is not another “add a graph and accuracy goes up” paper. That would be too convenient, and therefore suspicious. The useful idea is more operational: treat retrieval-augmented generation as a knowledge management process, not a pile of embeddings with a polite chatbot on top. The paper proposes RAGA, short for Reading-And-Graph-building-Agent, an autonomous system that reads documents, searches existing graph knowledge, verifies whether new entities or relations should be added, and then constructs or updates a knowledge graph with source-linked provenance.1 Its core loop is Read–Search–Verify–Construct, implemented as a ReAct-style tool-calling agent rather than a one-shot extraction pipeline. ...

Curved Space, Straighter Retrieval: Why Graph RAG Needs Geometry

Curved Space, Straighter Retrieval: Why Graph RAG Needs Geometry Retrieval looks simple until the wrong thing keeps showing up. A company builds a graph model over products, papers, suppliers, users, or transactions. The model performs reasonably well inside familiar territory. Then the data shifts. New products appear. A new research domain enters the citation graph. A social platform changes user behavior. The model’s internal knowledge, frozen inside parameters, starts behaving like yesterday’s org chart: technically structured, operationally stale. ...

Query the Receipt, Not the Vibe: DualGraph and the RAG Catalog Problem

A product catalog is not a paragraph with a search box Catalogs look deceptively friendly to RAG systems. A product page has descriptions, feature bullets, specification tables, prices, variants, categories, and marketing copy. Feed those pages into a vector database, ask an LLM a question, and the system should answer. This is the comforting story. It is also where many enterprise RAG demos begin their quiet decline into customer-support theater. ...

WorldDB Memory Wars — Why Agent Memory Needs Structure, Not More Tokens

Memory is cheap until it has to remember correctly. A chatbot can remember a paragraph for a few minutes. An enterprise agent is asked to remember a customer’s old address, current address, account owner, exception approval, product issue, refund promise, and the reason the promise changed last month. Then it must answer without mixing the past with the present. This is where “just add more context” begins to look less like strategy and more like buying a bigger drawer for unsorted receipts. ...

CQ or Consequences: What This LLM Benchmark Reveals About AI Requirements Work

Requirements work has a reputation problem. It is rarely the part of an AI project that receives the keynote slide, the demo video, or the executive applause. Nobody opens a budget meeting by saying, “What we really need is a better way to ask the system what it must know.” They should, but apparently civilization still has limits. ...

CQ, AI & The Question of Questions

Questions look cheap. That is why they are dangerous. In most enterprise AI projects, the visible work arrives late: dashboards, RAG demos, knowledge graphs, compliance assistants, workflow copilots, and executive slides with arrows pointing to a “semantic layer.” The invisible work arrives earlier and is less glamorous: deciding what the system must actually know, answer, retrieve, distinguish, reject, and explain. ...

Graph RAG, No Smoke: Why Explainable AI in Manufacturing Needs a Memory

Factory AI has an old communication problem. The model can say, “this screw-placement attempt is likely to fail.” The operator then asks the obvious follow-up: “Because of what?” A dashboard answers with a probability. A SHAP plot answers with colored bars. A feature-importance chart answers with something that looks scientific enough to intimidate the meeting room into silence. None of these answers necessarily tells the worker, engineer, or manager what is connected to what: the screw geometry, the robot arm, the training dataset, the preprocessing step, the model, the task, and the explanation artifact. ...

Epistemic Infrastructure: Why Your AI Knows Less Than It Thinks

Documents are rarely wrong in the same way. A project proposal can be relevant but obsolete. A meeting note can be accurate but non-binding. A market-size estimate can be useful but contradicted by later due diligence. A regulatory question can be unanswered and still more important than a polished paragraph that sounds certain. This is the small, boring, expensive problem hiding inside many enterprise AI deployments: the system finds the right files, then treats unlike things as if they had the same authority. ...