Data-Quality

Edge Control: Why Synthetic Graphs Need a Repair Pass

TL;DR for operators Synthetic graph data is easy to make look plausible and hard to make structurally right. A graph can have the right number of nodes, a sensible average edge count, and a respectable generative model behind it, while still getting the relational geometry wrong. In graph domains, that is not a cosmetic flaw. The edges are the thing. ...

Mind the Representation Gap: Why Enterprise AI Fails Before It Thinks

Enterprise AI has developed a charming habit: whenever a system fails, someone suggests using a larger model. The chatbot misread a customer complaint? Bigger model. The autonomous system struggled with a new sensor configuration? Bigger model. The video classifier understood the objects but missed the actual message? Bigger model, possibly with a more expensive logo. ...

Synthetic Data, Real Receipts: Why LLM Pipelines Need an Auditor

Opening — Why this matters now Synthetic data has become one of AI’s favorite escape routes. Real data is expensive, legally awkward, slow to collect, unevenly labeled, and sometimes simply unavailable. LLMs offer a tempting alternative: generate the missing examples, fill the long tail, create evaluation suites, simulate edge cases, and keep the training pipeline moving. Convenient. Elegant. Also mildly dangerous, which is usually where the interesting part begins. ...

Seeing Is Judging: Why LLMs Are Better Critics Than Creators in Time-Series Reasoning

A dashboard says revenue demand has “stabilized.” A monitoring agent says a sensor spike is “temporary.” A trading assistant says volatility has “fallen after the regime shift.” The sentence is smooth. The chart is nearby. The user is tired. That is usually enough for a bad explanation to survive. This is the quiet problem behind AI-assisted analytics: not whether a language model can write a plausible story about time-series data, but whether the story is faithful to the numbers. A recent paper, LLM-as-a-Judge for Time Series Explanations, studies exactly this gap by asking models to play two different roles: narrator and critic.1 ...

Redundancy Overload Is Optional: Finding the FDs That Actually Matter

Tables have a talent for pretending to be tidy. A customer table may have fifty columns. A transaction table may have a hundred. A log table may contain derived fields, timestamps, status codes, copied identifiers, normalized labels, and a few columns that nobody remembers creating but everybody is afraid to delete. Then a data profiling tool arrives, dutifully discovers functional dependencies, and returns several hundred thousand “valid” relationships. ...

When Benchmarks Rot: Why Static ‘Gold Labels’ Are a Clinical Liability

Clinical AI has a paperwork problem. Not the usual paperwork problem, where doctors drown in documentation and everyone promises that software will save them. The more interesting problem sits one layer below: the paperwork used to judge the software may itself be wrong. That is the uncomfortable center of Scalable Stewardship of an LLM-Assisted Clinical Benchmark with Physician Oversight, a paper that audits MedCalc-Bench, a benchmark for testing whether language models can compute medical risk scores from patient narratives.1 The paper’s target is not a toy dataset. MedCalc-Bench covers 55 medical calculators and includes 10,053 training instances plus 1,047 test instances. Its labels were produced through an LLM-assisted pipeline: GPT-3.5 matched patient contexts to calculator questions, GPT-4 extracted clinical features, and Python scripts aggregated those features into final scores. ...

Trust Issues: Why Neural Networks Need Their Own Internal Affairs Department

Accuracy is a comforting number. That is precisely the problem. A neural network can score well on a test set and still be operationally suspicious. The labels may be corrupted. The input may be degraded. A small patch may have quietly hijacked part of the model’s learned behavior. The model may be confident, calibrated enough for a dashboard, and still untrustworthy in the one place where the business actually needs it to behave. ...

When Noisy Data Talks Back: The Fragile Art of Learning Under Infinite Contamination

Bad data is not one problem. It is at least three problems wearing the same cheap trench coat. There is bad data that appears once and disappears. There is bad data that keeps appearing, but becomes rarer as the corpus grows. And there is bad data that settles in at a stable rate, like a permanent tenant with poor hygiene and legal representation. Business discussions about AI training data often compress these into one vague category called “noise”. Convenient, yes. Informative, no. ...

Touch Intelligence: How DigiData Trains Agents to Think with Their Fingers

Phones are where automation goes to embarrass itself. A desktop workflow can often be forced into a neat sequence: open tab, click menu, submit form, pretend the enterprise software was designed by someone who likes people. Mobile apps are less polite. They hide features behind drawers, gestures, modals, permissions, scrolling lists, bottom sheets, dark-pattern-ish confirmations, and the occasional button that looks decorative until it suddenly matters. A human user handles this with a mixture of visual attention, memory, muscle habit, and mild resentment. A mobile control agent has to do it with pixels, UI trees, and a policy that decides where the next finger should land. ...

Dirty Data, Clean Machines: How LLM Agents Rewire Predictive Maintenance

Workshop logs are not glamorous. They are where predictive-maintenance dreams go to meet misspelled component names, missing codes, wrong vehicle identifiers, and dates that imply a truck was both under repair and happily accumulating kilometres. Industrial AI, as ever, is less a matter of elegant algorithms than of persuading messy operational records to stop lying. ...