Machine Ethics

Identity Crisis: How a Trivial Trick Teaches LLMs to Think Backwards

Opening — Why this matters now Large language models can write poetry, solve Olympiad-level math problems, and simulate entire businesses—yet they reliably fail at a task that feels almost insulting in its simplicity: if Alice’s husband is Bob, they struggle to answer who is Bob’s wife? This failure mode, known as the reversal curse, has become something of an embarrassment for autoregressive models. More troublingly, a growing body of literature has argued that the curse is fundamental: a baked-in limitation of left-to-right next-token prediction. If true, this would place a hard ceiling on what today’s LLM architectures can ever reliably reason about. ...

When Language Learns to Doubt Itself: Self-Contradiction as an Upgrade Path for Multimodal AI

Opening — Why this matters now Multimodal large language models (MLLMs) can describe, caption, and reason about images with impressive fluency. Yet beneath the polished surface lies a persistent flaw: they often say the right thing without truly understanding it. This mismatch—known as the generation–understanding gap—has become a quiet bottleneck as MLLMs move from demos into decision‑support systems, compliance tools, and autonomous agents. ...

Agentic Systems Need Architecture, Not Vibes

Opening — Why this matters now Agentic AI has officially entered its awkward adolescence. It can plan, call tools, collaborate, and occasionally impress investors—but it also hallucinates, forgets, loops endlessly, and collapses under modest real‑world complexity. The problem is no longer model capability. It’s architecture. Today’s agent systems are mostly stitched together through intuition, blog wisdom, and prompt folklore. Powerful, yes—but brittle. What’s missing is not another clever prompt trick, but an engineering discipline. ...

Metric Time Without the Clock: Making ASP Scale Again

Opening — Why this matters now Temporal reasoning has always been the Achilles’ heel of symbolic AI. The moment time becomes quantitative—minutes, deadlines, durations—logic programs tend to balloon, grounders panic, and scalability quietly exits the room. This paper lands squarely in that discomfort zone and does something refreshingly unglamorous: it makes time boring again. And boring, in this case, is good for business. ...

CAR-bench: When Agents Don’t Know What They Don’t Know

Opening — Why this matters now LLM agents are no longer toys. They book flights, write emails, control vehicles, and increasingly operate in environments where getting it mostly right is not good enough. In real-world deployments, the failure mode that matters most is not ignorance—it is false confidence. Agents act when they should hesitate, fabricate when they should refuse, and choose when they should ask. ...

When Models Listen but Stop Thinking: Teaching Audio Models to Reason Like They Read

Opening — Why this matters now Audio-first interfaces are everywhere. Voice assistants, call-center bots, in-car copilots, and accessibility tools all rely on large audio-language models (LALMs) that promise to hear and think at the same time. Yet in practice, something awkward happens: the same model that reasons fluently when reading text suddenly becomes hesitant, shallow, or just wrong when listening to speech. ...

Prompt Wars: When Pedagogy Beats Cleverness

Opening — Why this matters now Educational AI has entered its prompt era. Models are powerful, APIs are cheap, and everyone—from edtech startups to university labs—is tweaking prompts like seasoning soup. The problem? Most of this tweaking is still artisanal. Intuition-heavy. Barely documented. And almost never evaluated with the same rigor we expect from the learning science it claims to support. ...

DISARM, but Make It Agentic: When Frameworks Start Doing the Work

Opening — Why this matters now Foreign Information Manipulation and Interference (FIMI) has quietly evolved from a niche security concern into a persistent, high‑tempo operational problem. Social media platforms now host influence campaigns that are faster, cheaper, and increasingly AI‑augmented. Meanwhile, defenders are expected to produce timely, explainable, and interoperable assessments—often across national and institutional boundaries. ...

Lost Without a Map: Why Intelligence Is Really About Navigation

Opening — Why this matters now AI discourse is increasingly stuck in a sterile debate: how smart are large models, really? The paper you just uploaded cuts through that noise with a sharper question—what even counts as intelligence? At a time when transformers simulate reasoning, cells coordinate without brains, and agents act across virtual worlds, clinging to neuron‑centric or task‑centric definitions of intelligence is no longer just outdated—it is operationally misleading. ...

Deep GraphRAG: Teaching Retrieval to Think in Layers

Opening — Why this matters now Retrieval-Augmented Generation has reached an awkward adolescence. Vector search is fast, scalable, and confidently wrong when questions require structure, multi-hop reasoning, or global context. GraphRAG promised salvation by injecting topology into retrieval — and promptly ran into its own identity crisis: global search is thorough but slow, local search is precise but blind, and most systems oscillate between the two without ever resolving the tension. ...