Assurance

FormuLLA: When LLMs Stop Talking and Start Formulating

Opening — Why this matters now Pharmaceutical 3D printing has promised personalization for over a decade. In practice, it has mostly delivered spreadsheets, failed filaments, and a great deal of human patience. The bottleneck has never been imagination—it has been formulation. Every new drug–excipient combination still demands expensive trial-and-error, even as printers themselves have matured. ...

Think Before You Sink: Streaming Hallucinations in Long Reasoning

Opening — Why this matters now Large language models have learned to think out loud. Chain-of-thought (CoT) reasoning has become the default solution for math, planning, and multi-step decision tasks. The industry applauded: more transparency, better answers, apparent interpretability. Then reality intervened. Despite elegant reasoning traces, models still reach incorrect conclusions—sometimes confidently, sometimes catastrophically. Worse, the mistakes are no longer obvious. They creep in quietly, spread across steps, and survive superficial self-corrections. What we call “hallucination” has grown up. And our detection methods have not. ...

Causality Remembers: Teaching Social Media Defenses to Learn from the Past

Opening — Why this matters now Social media coordination detection is stuck in an awkward adolescence. Platforms know coordinated inauthentic behavior exists, regulators know it scales faster than moderation teams, and researchers know correlation-heavy detectors are brittle. Yet most deployed systems still behave as if yesterday’s parameters will work tomorrow. This paper introduces Adaptive Causal Coordination Detection (ACCD)—not as another accuracy tweak, but as a structural correction. Instead of freezing assumptions into static thresholds and embeddings, ACCD treats coordination detection as a learning system with memory. And that subtle shift matters more than the headline F1 score. ...

Crossing the Line: Teaching Pedestrian Models to Reason, Not Memorize

Opening — Why this matters now Pedestrian fatalities are rising, mid-block crossings dominate risk exposure, and yet most models tasked with predicting pedestrian behavior remain stubbornly local. They perform well—until they don’t. Move them to a new street, a wider arterial, or a different land-use mix, and accuracy quietly collapses. This is not a data problem. It’s a reasoning problem. ...

When Models Remember Too Much: The Quiet Economics of Memorization

Opening — Why this matters now Large Language Models (LLMs) are often praised for what they generalize. Yet, beneath the surface, a less glamorous behavior quietly persists: they remember—sometimes too well. In an era where models are trained on ever-larger corpora under increasing regulatory scrutiny, understanding when memorization occurs, why it happens, and how it can be isolated is no longer an academic indulgence. It is an operational concern. ...

Trust No One, Train Together: Zero-Trust Federated Learning Grows Teeth

Opening — Why this matters now Critical infrastructure is no longer attacked by teenagers in hoodies. It is probed, poisoned, and patiently undermined by adversaries who understand distributed systems better than most defenders. From water treatment plants to national energy grids, Industrial IoT (IIoT) has become a strategic attack surface. Federated Learning (FL) was supposed to help—privacy-preserving, collaborative, decentralized. Instead, it quietly introduced a new problem: you are now trusting hundreds or thousands of autonomous agents not to lie. ...

When Fairness Fails in Groups: From Lone Counterexamples to Discrimination Clusters

Opening — Why this matters now Most algorithmic fairness debates still behave as if discrimination is a rounding error: rare, isolated, and best handled by catching a few bad counterexamples. Regulators ask whether a discriminatory case exists. Engineers ask whether any unfair input pair can be found. Auditors tick the box once a model is declared “2-fair.” ...

AI Writes the Rules: When Formal Logic Teaches Language Discipline

Opening — Why this matters now Natural language is where most software failures quietly begin. Requirements are written in good faith, read with confidence, and implemented with subtle misunderstandings that only surface once systems are deployed, audited, or—worse—regulated. The uncomfortable truth is that natural language is flexible where engineering systems demand rigidity. This paper tackles that gap head‑on, proposing a method where formal logic leads and language follows. Instead of writing requirements first and wrestling semantics into place later, the authors invert the workflow: start from formal specification patterns, then systematically generate a controlled natural language (CNL) using an AI assistant. Precision first. Fluency second. ...

Gated, Not Gagged: Fixing Reward Hacking in Diffusion RL

Opening — Why this matters now Reinforcement learning has become the fashionable finishing school for large generative models. Pre-training gives diffusion models fluency; RL is supposed to give them manners. Unfortunately, in vision, those manners are often learned from a deeply unreliable tutor: proxy rewards. The result is familiar and embarrassing. Models learn to win the metric rather than satisfy human intent—rendering unreadable noise that scores well on OCR, or grotesquely saturated images that charm an aesthetic scorer but repel humans. This phenomenon—reward hacking—is not a bug in implementation. It is a structural failure in how we regularize learning. ...

When Three Examples Beat a Thousand GPUs

Opening — Why this matters now Neural Architecture Search (NAS) has always had an image problem. It promises automation, but delivers GPU invoices large enough to frighten CFOs and PhD supervisors alike. As computer vision benchmarks diversify and budgets tighten, the question is no longer whether we can automate architecture design — but whether we can do so without burning weeks of compute on redundant experiments. ...