Compliance

Timeline Triage: How LLMs Learn to Read Between Clinical Lines

Opening — Why This Matters Now Clinical AI has entered its bureaucratic phase. Health systems want automation, not epiphanies: accurate records, structured events, timelines that behave. Yet the unstructured clinical note remains stubbornly chaotic — a space where abbreviations proliferate like antibodies and time itself is relative. The paper “UW‑BioNLP at ChemoTimelines 2025” fileciteturn0file0 offers a clean window into this chaos. The authors attempt something deceptively simple: reconstruct chemotherapy timelines from raw oncology notes using LLMs. The simplicity is a trap; the work is a masterclass in how modern models stumble, self-correct, hallucinate, and ultimately converge into something usefully structured. ...

Trees That Think Faster: Adaptive Compression for the Long-Context Era

Opening — Why this matters now Large Language Models keep extending their context windows, yet the economics of doing so remain brutally simple: quadratic attention doesn’t scale with human ambition. Businesses want agents that remember weeks of emails, thousands of documents, and years of interactions. Hardware budgets disagree. Enter a new wave of research attempting to compress context without destroying its soul. Many approaches flatten, prune, or otherwise squeeze text into generic latent mush. Predictably, performance collapses in tasks that require nuance, positional precision, or long‑range logic. ...

When Motion Lies: Why Video LLMs Keep Misreading Physics

Why This Matters Now The AI industry has spent years teaching models to see—objects, scenes, actions, the usual suspects. But the world doesn’t merely look a certain way; it moves according to rules. As video‑native applications surge (autonomous monitoring, industrial automation, robotics, compliance analysis), the expectations for machine perception are shifting from recognizing what is visible to inferring what is happening. ...

Benchmarks Are From Mars, Workflows Are From Venus: Why AI Research Co‑Pilots Keep Failing in the Wild

Opening — Why this matters now Benchmarks are having a moment. Every few weeks, a new leaderboard appears claiming to measure a model’s research capability—from literature recall to CRISPR planning. And yet, inside real laboratories, scientists quietly report a different truth: systems that ace these benchmarks often become surprisingly helpless when asked to collaborate across days, adapt to constraints, or simply remember that the budget shrank since yesterday. ...

Context Is King: How Ontologies Turn Agentic AI from Guesswork to Governance

Opening — Why this matters now Agentic AI has slipped quietly from the research lab into the enterprise boardroom. The pitch is irresistible: autonomous systems that can monitor operations, make decisions, and even justify their actions. But the reality is less cinematic. Most agentic AI systems still operate on a foundation that is—politely—semantic improv. LLMs “understand” enterprise contexts only in the sense that a tourist “understands” a city by reading the brochure. ...

Lost in Translation: When Multilingual LLMs Miss the Medical Plot

Opening — Why This Matters Now Multilingual LLMs have become everyone’s favorite hammer—and unsurprisingly, everything is starting to look like a nail. Hospitals, in particular, are eager to automate the unglamorous work of parsing Electronic Health Records (EHRs). But as the paper Are LLMs Truly Multilingual? Exploring Zero-Shot Multilingual Capability of LLMs for Information Retrieval: An Italian Healthcare Use Case reminds us, this hammer still slips dangerously when the text shifts away from English. ...

Order in the Court: Why XIL Doesn’t Panic Over Human Bias

Opening — Why This Matters Now Interactive AI is entering boardrooms faster than corporate compliance teams can draft new slide decks. Many firms now deploy explanation-based interfaces—systems that don’t just make predictions but reveal why they made them. The assumption is seductive: give humans explanations, get better oversight. But psychology rarely cooperates. Order effects—our tendency to weigh early or recent information more heavily—threaten to distort user trust and training signals in these systems. ...

Packing a Punch: How Model‑Based AI Outperformed Decades of Sphere‑Packing Theory

Opening — Why this matters now AI’s recent victories in mathematics—AlphaGeometry, DeepSeek‑Prover, AlphaEvolve—have leaned on a familiar formula: overwhelming compute, evolutionary thrashing, and enough sampling to make Monte Carlo blush. Effective, yes. Elegant? Hardly. Sphere packing, however, does not care for this style of progress. Each evaluation in the three‑point SDP framework can require days, not milliseconds. There is no room for “just try another million candidates.” Any system operating here must think, not flail. ...

STRIDE Gets a Plus-One: How ASTRIDE Rewrites Threat Modeling for the Agentic Era

Opening — Why this matters now Agentic AI is no longer a research toy but the skeleton key of modern automation pipelines. As enterprises rush to stitch together LLM-driven planners, tool callers, and multimodal agents, one truth becomes painfully clear: our security frameworks were built for software, not for software that thinks. STRIDE, the trusted stalwart of threat modeling, was never meant to grapple with prompt injections, hallucinated tool invocations, or inter-agent influence loops. ...

Worlds Within Reach: How SIMA 2 Turns Virtual Environments into Training Grounds for Generalist Agents

Opening — Why this matters now The AI industry has spent the past two years shouting about “agentic systems,” but most real agents still behave like gifted interns: competent in narrow conditions, confused everywhere else. SIMA 2, from Google DeepMind, tries to push past this ceiling. Instead of worshipping model size, SIMA 2 doubles down on something far more mundane—and far more difficult: training an embodied, generalist agent across many virtual worlds simultaneously. ...