Machine Ethics

Reading Between the Lines: How AI Learned to Interpret the Law

Opening — Why this matters now Legal interpretation used to belong to humans in black robes, law libraries, and late-night arguments about commas. Now it increasingly happens in chat windows. As large language models (LLMs) enter legal practice—drafting contracts, summarizing judgments, and proposing interpretations—the question is no longer whether AI will assist legal reasoning. It already does. The real question is whether machines can interpret law in any meaningful sense. ...

The Judge Is Not Always Right: Stress‑Testing LLM Judges

Opening — Why this matters now The modern AI ecosystem quietly relies on a strange idea: we use one AI to judge another. From model leaderboards to safety benchmarks, LLM‑as‑a‑judge systems increasingly replace human reviewers. They score answers, rank models, and sometimes decide which system appears “better.” The practice scales beautifully. It is also, as recent research suggests, slightly terrifying. ...

Bending the Beam, Not the Brain: What RL with Perfect Rewards Still Can’t Teach LLMs

Opening — Why this matters now Large language models are increasingly asked to do more than summarize emails or draft marketing copy. In engineering, finance, science, and infrastructure planning, AI systems are expected to reason — not merely imitate patterns. The prevailing assumption in many AI labs has been straightforward: if we train models with reinforcement learning and give them perfectly verifiable rewards, they will gradually learn the underlying rules of a domain. ...

Double Helix, Double Checks: Why Agentic AI Needs Governance Before It Writes Your Code

Opening — Why this matters now Agentic AI is having a moment. Autonomous systems that plan, execute, and iterate on complex tasks are rapidly moving from research demos into real engineering workflows. But there is a quiet problem hiding beneath the excitement: reliability. When large language models (LLMs) are asked to perform long-horizon engineering tasks—like refactoring a production codebase—they tend to behave less like disciplined engineers and more like extremely confident interns. They forget earlier decisions, ignore instructions, improvise architectures, and occasionally rewrite rules they were explicitly told not to touch. ...

$Cover image$

From Prompt Chains to Algebra: Why Agentics 2.0 Treats AI Workflows Like Math

Opening — Why this matters now The first generation of “AI agents” felt impressive but fragile. Prompt chains broke silently. Multi‑agent conversations wandered off task. Systems worked in demos yet collapsed in production. Enterprises quickly discovered a sobering truth: language models are good at generating text, but enterprise systems need something closer to software engineering discipline. ...

Memory Isn’t Personal: Why LLMs Still Forget What You Like

Opening — Why this matters now AI assistants are rapidly moving from tools to companions. People now ask language models not only for facts, but for advice tailored to their habits, tastes, and goals. If a user tells an assistant they dislike crowded tourist attractions, the assistant should remember that the next time travel planning comes up. If someone prefers indie films over blockbusters, recommendations should evolve accordingly. ...

Small Model, Big Eyes: Why Microsoft’s Phi‑4 Vision Model Is a Warning Shot to Giant Multimodal AI

Opening — Why this matters now For the past three years, the playbook for building AI systems has been painfully simple: make them bigger. More parameters. More tokens. More GPUs. More electricity bills large enough to fund a small island nation. Then along comes Phi‑4‑reasoning‑vision‑15B, a compact multimodal reasoning model from Microsoft Research, quietly suggesting that scale may not be the only path forward. ...

The Ambiguity Advantage: When AI Becomes Your Most Honest (and Sometimes Too Polite) Manager

Opening — Why this matters now Generative AI has quietly entered the executive suite. From strategy memos to operational planning, large language models are increasingly used as decision-support partners. They summarize markets, propose strategies, and generate detailed implementation plans in seconds. In theory, this should expand managerial intelligence. In practice, however, something subtler happens. ...

When AI Agents Read the Manual: Why τ-Knowledge Exposes the Limits of LLM Reasoning

Opening — Why this matters now The current generation of AI optimism assumes a simple trajectory: larger models, better reasoning, more autonomous agents. Yet anyone who has actually deployed an LLM-powered system in a real business workflow knows a frustrating truth: the model often fails not because it lacks intelligence, but because it fails to navigate messy operational knowledge. ...

Agents in the Lab: When Bayesian Adversaries Keep AI Scientists Honest

Opening — Why this matters now AI has recently discovered a strange new hobby: pretending to be a scientist. Large Language Models can now generate hypotheses, write simulation code, analyze datasets, and even draft papers. In principle, this promises a dramatic acceleration of scientific discovery. In practice, however, LLMs have a small but persistent flaw: they occasionally hallucinate. In research workflows, a hallucination is not merely embarrassing—it can propagate through experiments, code, and analysis pipelines. ...