Cover image

Causality Remembers: Teaching Social Media Defenses to Learn from the Past

Moderation teams do not usually lose because they see nothing. They lose because they see too much: thousands of accounts posting near the same topic, near the same time, with enough similarity to look suspicious and enough difference to remain deniable. Some are campaign assets. Some are enthusiastic humans. Some are bots. Some are people who simply saw the same trending story and behaved like everyone else, which is annoying for both democracy and data science. ...

January 5, 2026 · 17 min · Zelina
Cover image

When Guardrails Learn from the Shadows

Labels are expensive. Safety labels are worse. A normal classification project asks annotators to decide whether a customer complaint is urgent, whether a product photo contains a defect, or whether a support ticket belongs to billing. Annoying, yes. Existentially unpleasant, usually no. LLM safety moderation is different. The training examples may include malicious requests, jailbreak attempts, harmful advice, unsafe responses, and edge cases where intent is deliberately hidden under polite phrasing. The annotator must not only read the text but understand what the user is trying to make the model do. In other words, the expensive part is not clicking “safe” or “unsafe.” The expensive part is detecting intent when the user has carefully wrapped it in bubble wrap. ...

December 26, 2025 · 16 min · Zelina
Cover image

Quantum Bridges: Crossing the Label Gap with ILQSSL and IPQSSL

TL;DR for operators Labels are expensive. That is the clean business problem behind this paper. In healthcare, credit review, fraud triage, and scientific classification, organisations often have many observations and too few trusted labels. Semi-supervised learning tries to stretch those scarce labels across the structure of the data rather than pretending every missing label is merely a procurement problem with a nicer dashboard. ...

August 9, 2025 · 15 min · Zelina