Data Lineage

Logs are where teams go after the dashboard has already failed. A pipeline stalls. A model run produces nonsense. A compute job quietly burns budget on the wrong node. Someone opens three dashboards, two notebooks, and one ancient SQL snippet named final_debug_v3_really_final.sql. Then the archaeology begins. The paper LLM Agents for Interactive Workflow Provenance: Reference Architecture and Evaluation Methodology proposes a more interesting answer: do not ask an LLM to “understand the workflow” in the abstract. Give it live provenance metadata, a compact schema, query guidelines, and tools that execute structured queries on its behalf.1 In other words, stop treating the model as a psychic dashboard. Treat it as a controlled interface to workflow exhaust. ...