AI Assurance

Rules of Attraction: How LLMs Learn to Judge Better Than We Do

Rubrics are supposed to make judgment boring. That is their charm. A good rubric tells a teacher why one essay deserves a 5 instead of a 3, tells a compliance reviewer why one response is acceptable and another is risky, and tells an internal QA team why a generated summary is useful rather than merely confident. In business, boring judgment is valuable. It scales. It can be audited. It survives employee turnover. It does not wake up one morning and decide that “clarity” now means “vibes with a semicolon.” ...

When AI Reviews AI: Turning Foundation Models into Safety Inspectors

Inspection is not glamorous. It is not the robot demo, not the dashboard, not the moment a prototype obediently follows a traffic cone across a test track. Inspection is the slow, expensive discipline of asking whether the thing that worked once will behave acceptably when the weather changes, the path bends, the sensor gets confused, or the requirement was written by a tired engineer using the phrase “successfully complete” as if English were a formal language. ...

Benchmarks Without Borders: Inside the Moduli Space of AI Psychometrics

Procurement Has a Benchmark Problem Procurement teams love benchmark tables. They are clean, sortable, and emotionally comforting. Vendor A beats Vendor B by 3.7 points on a reasoning suite; Vendor C wins on code generation; Vendor D claims better tool use under “realistic agent workflows,” a phrase that usually means someone added a browser, a calculator, and optimism. ...

Drift Happens: Why AI Needs a Memory for People, Not Just Patterns

Reminders are supposed to be boring. Take medication. Drink water. Attend an appointment. Confirm the task is done. The whole point of a reminder system is that it sits quietly in the background, nudging daily life along without demanding a board meeting. But in dementia care, the reply to a reminder can become more important than the reminder itself. A person who once replied warmly may become brief and flat. Someone who usually answers the question may begin drifting around it. The change may not arrive as a dramatic failure. It may arrive as a slope. ...

Truth Machines: VeriCoT and the Next Frontier of AI Self-Verification

The machine said the right answer. Annoyingly, that is not the same thing as being right. Audit a model-generated legal memo, clinical explanation, or compliance answer and the same awkward question appears: did the system reason correctly, or did it simply land on the right sentence after a scenic tour through nonsense? ...