Cover image

Safety by Design, Rewritten: When Data Defines the Boundary

Safety by Design, Rewritten: When Data Defines the Boundary Boundaries are usually drawn before deployment. A product team defines where a system is allowed to operate, safety engineers translate that into requirements, regulators ask whether the evidence matches the claim, and everyone pretends the world politely fits inside the diagram. Charming. Occasionally even useful. ...

January 30, 2026 · 16 min · Zelina
Cover image

When Models Know They’re Wrong: Catching Jailbreaks Mid-Sentence

Guardrails usually fail quietly. A user sends a malicious prompt. The model begins answering. The safety policy that looked firm in the demo environment starts behaving like office wallpaper: present, decorative, and not especially involved. By the time a post-hoc filter reads the final answer, the model has already produced the thing it should not have produced. The system may block the response from the user, but the real lesson is less flattering: the model crossed the line before the defense noticed. ...

January 16, 2026 · 16 min · Zelina