Persona Non Grata: When LLMs Forget They're AI
A behavioral audit shows why professional personas can suppress AI self-disclosure, why bigger models do not solve it, and how enterprises should test trust before deploying expert-like agents.
A behavioral audit shows why professional personas can suppress AI self-disclosure, why bigger models do not solve it, and how enterprises should test trust before deploying expert-like agents.
SpatialBench shows why multimodal models that recognize scenes can still fail at the harder work of spatial abstraction, causality, and planning.
A close reading of an 8-puzzle study showing why fluent reasoning traces are still a poor substitute for explicit state management, validators, and real planning systems.
How FRAGMENTA reframes small-data drug lead optimization as a feedback-loop problem, not merely a bigger-model problem.
A mechanism-first reading of how GPT-style Transformers can be adapted from text tokens to continuous mobility trajectories without pretending the tutorial is a benchmark race.
A Chinese pharmacist licensure benchmark shows why LLM deployment in professional education should be mapped by task category, not model leaderboard score.
A mechanism-first reading of a VLM factuality paper showing why multimodal systems need explicit verification paths, not just larger perception models.
A mechanism-first reading of PaTAS, a Subjective Logic framework that treats neural-network trust as something propagated through data, parameters, and inference paths—not guessed from accuracy.
A mechanism-first reading of how REACT and SemaLens use LLMs and VLMs to make safety-critical AI systems more inspectable without pretending that AI can certify itself.
A mechanism-first look at how copyright-detection pipelines turn LLM memorization into an operational audit signal, without pretending it is courtroom proof.