Regulation

When Models Know But Won’t Act: The Interpretability Illusion

Opening — Why this matters now There is a quiet assumption baked into most AI governance frameworks: if we can see what a model is thinking, we can fix it when it goes wrong. It’s a comforting idea. Regulators like it. Engineers build tooling around it. Consultants sell it. Unfortunately, this paper demonstrates something far less convenient: models can know the right answer internally—and still fail to act on it. ...

CUDA Your Way Out: When Metaheuristics Meet GPUs (and a Hint of AI)

Opening — Why this matters now Optimization has always been the quiet bottleneck of modern systems. Logistics, scheduling, routing—everything that looks “operational” is, in reality, a combinatorial nightmare. And like most nightmares in computing, it gets exponentially worse with scale. For years, the industry settled into a familiar compromise: either use exact solvers and wait (sometimes indefinitely), or use heuristics and accept imperfection. GPUs briefly promised salvation—but mostly delivered specialized speedups for narrow problems. ...

Diffusion Decoding Gets a Personality: When Diversity Stops Being Accidental

Opening — Why this matters now There’s a quiet shift happening in language model inference. Not in training—everyone’s still obsessing over scaling laws—but in decoding. The part we used to treat as a postscript is becoming the actual battleground. Diffusion language models, in particular, have exposed an uncomfortable truth: generating one good answer is easy. Generating many different good answers is not. ...

The Box Maze: When AI Stops Guessing and Starts Knowing Its Limits

Opening — Why this matters now There is a quiet but uncomfortable truth in modern AI: large language models are not wrong because they lack intelligence — they are wrong because they lack discipline. Despite layers of RLHF, safety filters, and carefully engineered prompts, LLMs still hallucinate under pressure. Not randomly, but systematically — especially when pushed into emotionally charged, adversarial, or high-stakes scenarios. ...

The Cost of Knowing You’re Wrong: Why Two Samples Beat Eight in AI Reasoning

Opening — Why this matters now Reasoning models are getting expensive. Not just in dollars, but in attention, latency, and operational complexity. The industry’s instinctive response? Sample more. Ask the model multiple times, average the answers, and hope confidence emerges from repetition. It’s a comforting idea—almost democratic. But as this paper quietly demonstrates, more votes don’t necessarily lead to better judgment. Sometimes, two well-chosen signals outperform eight redundant ones. ...

The Hidden Playbook of LLMs: How AI Quietly Thinks Like a Hacker

Opening — Why this matters now There is a quiet shift happening in AI systems—one that most dashboards, benchmarks, and leaderboards fail to capture. We have spent the last two years obsessing over model size, context length, and benchmark scores. Meanwhile, something far more consequential has emerged beneath the surface: LLMs are beginning to behave like decision systems, not just language generators. ...

Themis Knows Best: When AI Judges Start Training Other AI

Opening — Why this matters now Autonomous agents are finally leaving the sandbox. From GUI automation to full computer-use agents, the frontier is no longer about whether models can act—but whether they can learn from acting without collapsing into noise. The uncomfortable truth: scaling models is easy. Scaling reliable learning signals is not. This paper introduces a framework—quietly but decisively—that reframes the problem. Not as a model problem. Not even as a data problem. ...

When EEG Stops Thinking in Squares: Why Linear-Time Models Are Quietly Winning

Opening — Why this matters now There is a quiet bottleneck in AI that rarely makes headlines: time complexity. While large language models dominate attention, a parallel world—biosignals like EEG—is struggling with something more mundane but more fatal: scale. EEG data is long, messy, and structurally inconsistent. Transformer-based models, elegant as they are, scale with $O(n^2)$ complexity. That’s tolerable for text. It’s disastrous for continuous brain signals. ...

Context Rot & The Memory Illusion: Why Bigger Prompts Won’t Save Your AI

Opening — Why this matters now Everyone is obsessed with context windows. 200K tokens. 1M tokens. Soon, 10M tokens. The implicit promise is seductive: give the model enough room, and memory becomes a solved problem. That promise is wrong. The paper Facts as First-Class Objects: Knowledge Objects for Persistent LLM Memory fileciteturn0file0 doesn’t just challenge this assumption—it dismantles it with uncomfortable precision. The issue is not how much a model can remember in a single session. It’s what survives after that session ends. ...

Learning Less, Winning More: The Curious Case of Sensi’s Efficiently Wrong Intelligence

Opening — Why this matters now The industry has quietly shifted its obsession. Not long ago, the benchmark question was simple: Can AI solve the task? Today, a more uncomfortable question is emerging: How many tries does it take before the AI even understands the task? In a world of agentic systems—autonomous traders, copilots, and decision engines—test-time learning efficiency is no longer a technical curiosity. It is an economic constraint. ...