Cover image

Right Answer, Wrong Audit: When Reasoning Models Grade the Destination, Not the Route

Right Answer, Wrong Audit: When Reasoning Models Grade the Destination, Not the Route A reviewer sees the final number. It is correct. Then the quiet failure begins. The reviewer stops asking whether the argument actually works. The missing step becomes “implicit.” The shuffled logic becomes “not ideal, but acceptable.” The circular explanation becomes “verbose but essentially correct.” The answer has done something worse than persuade. It has anesthetized the audit. ...

June 7, 2026 · 19 min · Zelina
Cover image

RAG’s Receipt Problem: When Correct Answers Don’t Prove Retrieval

RAG’s Receipt Problem: When Correct Answers Don’t Prove Retrieval Retrieval-augmented generation has become the respectable outfit enterprise AI wears when it wants to look grounded. Add a document store, retrieve a few passages, attach citations, and the answer suddenly appears more disciplined than a free-floating chatbot. That appearance is useful. It is not proof. ...

May 30, 2026 · 16 min · Zelina
Cover image

Auditing the Illusion of Forgetting: When Unlearning Isn’t Enough

Deletion requests sound simple until the model answers politely. A user asks for data to be removed. A publisher demands that copyrighted passages stop being reproduced. A compliance team wants evidence that a fine-tuned model no longer carries traces of a forbidden dataset. The model is run through an unlearning method, the surface tests improve, the dashboard turns less red, and everyone enjoys the brief spiritual comfort of a green checkmark. ...

January 22, 2026 · 17 min · Zelina
Cover image

When Fairness Fails in Groups: From Lone Counterexamples to Discrimination Clusters

Imagine two fairness bugs. In the first, changing a protected attribute while holding everything else constant shifts a model’s output enough to trigger one unfair decision. In the second, the same underlying applicant profile can fracture into nineteen meaningfully different score bands as protected attributes change. A conventional pairwise fairness test records both as violations. One counterexample each. Very tidy. Also not especially useful. ...

January 4, 2026 · 17 min · Zelina
Cover image

The Ethics of Not Knowing: When Uncertainty Becomes an Obligation

Uncertainty is the most convenient word in governance. A model is uncertain, so the system waits. A committee is uncertain, so the decision is deferred. A risk officer is uncertain, so the memo gets another paragraph of decorative caution and nobody quite owns the next step. Very mature. Very responsible. Also, sometimes, very useful for avoiding responsibility while looking intellectually respectable. ...

December 20, 2025 · 17 min · Zelina
Cover image

When Tokens Remember: Graphing the Ghosts in LLM Reasoning

Audit is easy when the answer is a single lookup. A customer asks, “What is your refund policy?” The model quotes the policy paragraph. We check whether the quoted paragraph came from the right source. Very civilized. Everyone goes home early. But real enterprise LLM work is rarely that tidy. A compliance assistant reads a contract, extracts obligations, compares them with internal policy, reasons through exceptions, and writes a recommendation. A research assistant reads multiple sources, builds an intermediate summary, then answers a question from that summary. A support agent reads a user history, infers the likely issue, then proposes the next action. In these cases, the final sentence may depend on prompt evidence and on earlier generated text. ...

December 18, 2025 · 16 min · Zelina
Cover image

Who Owns Your Words? Copyright, LLMs, and the Quiet Arms Race Over Training Data

The new copyright question is not “did the model copy me?” but “how would I know?” A writer uploads a chapter. A publisher uploads a manuscript. A compliance team uploads a protected document. The question is simple enough to ask in one sentence: did this material end up inside a large language model’s training data? ...

November 26, 2025 · 17 min · Zelina
Cover image

What LLMs Remember—and Why: Unpacking the Entropy-Memorization Law

TL;DR for operators Memorization audits usually start with the wrong question: “Which individual text snippets look memorized?” This paper suggests a better first diagnostic: group many snippets by how closely the model reproduces them, then measure the entropy of the token distribution inside each group.1 The result is an empirical pattern the authors call Entropy–Memorization Linearity. In plain English: when training examples are pooled by edit-distance score, their set-level entropy forms a strong linear relationship with how closely the model reproduces them. Since the paper’s “memorization score” is an edit distance, lower score means stronger verbatim reproduction; higher score means the generated continuation is farther from the ground truth. ...

July 13, 2025 · 15 min · Zelina