Cover image

Fine-Tuned, Fine Print: Why Post-Training Teaches Models What to Trust

Enterprise AI has entered its “sure, but can it use the evidence?” phase. That is progress, technically. It is also where many deployment stories begin to get expensive. The first generation of business LLM adoption was satisfied if a model could produce a fluent answer. The next generation asks something more demanding: can the model use retrieved documents, compliance policies, tool outputs, customer records, analyst notes, and human feedback in the right way? ...

June 10, 2026 · 17 min · Zelina
Cover image

Preference Laundering: How RLHF Can Turn Better Answers Into Bigger Biases

Feedback sounds clean. A user tries two model answers. One is more helpful, safer, more complete, and less obviously stupid. The other is worse. The annotator picks the better one. The reward model learns from that preference. The policy is optimized. Everyone goes home believing that the system has become more aligned. ...

June 5, 2026 · 18 min · Zelina
Cover image

Time to Prefer: Why Binary RLHF Feedback Leaves Reward Models Guessing

Time to Prefer: Why Binary RLHF Feedback Leaves Reward Models Guessing Thumbs-up feedback looks efficient. It is clean, cheap, easy to store, and friendly to dashboards. One output wins, another output loses, and the reward model learns what humans supposedly want. A tidy little morality market, with all the nuance of a vending machine. ...

June 5, 2026 · 17 min · Zelina
Cover image

Think Less, Align Better: The New Economics of AI Reasoning

Opening — Why this matters now Enterprise AI is entering its mildly awkward teenage phase: everyone wants intelligence, nobody wants the invoice. For the last two years, much of the AI conversation has revolved around more: more context, more reasoning tokens, more chain-of-thought, more human feedback, more evaluators, more synthetic data, more agents, more dashboards to explain why the agents broke the dashboards. The operating assumption was simple enough: if the model thinks more, explains more, or trains on more feedback, it should perform better. ...

May 9, 2026 · 19 min · Zelina
Cover image

The Reward Is in the Room: Why AI Automation Needs Better Judgment, Not Just Bigger Models

Opening — Why this matters now AI adoption has entered its second, less glamorous phase. The first phase was easy to explain: make the model generate things. Emails, reports, code, dashboards, summaries, customer replies, compliance drafts, market notes, training content. Give the machine a prompt, admire the fluent output, and pretend the future has arrived because the paragraphs are well-spaced. ...

May 7, 2026 · 16 min · Zelina
Cover image

Alignment Isn’t Free: When Safety Objectives Start Competing

Customer support is where alignment theories go to become invoices. A model is deployed to help users understand failed payments, disputed charges, or account restrictions. Product wants it to be useful. Legal wants it to avoid regulated advice. Trust and safety wants it to refuse suspicious requests. Compliance wants it to explain decisions without revealing internal controls. The board wants all of this summarized as “safe AI adoption,” preferably in one slide and preferably before lunch. ...

December 28, 2025 · 14 min · Zelina
Cover image

Active Minds, Efficient Machines: The Bayesian Shortcut in RLHF

TL;DR for operators Labels are the awkward invoice behind modern alignment. RLHF looks elegant in diagrams: generate outputs, ask humans which one is better, train a reward model, optimise the policy, repeat until everyone pretends the reward model is civilisation. In practice, most preference comparisons are not equally useful. Some are obvious. Some are redundant. Some teach the model almost nothing except that annotator budgets have a sense of humour. ...

November 8, 2025 · 14 min · Zelina
Cover image

The Bullshit Dilemma: Why Smarter AI Isn't Always More Truthful

TL;DR for operators Most AI quality programmes still treat truthfulness as a factual accuracy problem: did the model get the answer right, cite the source, or hallucinate a feature that does not exist? That is necessary. It is not sufficient. The paper behind this article argues for a nastier category: “machine bullshit,” meaning model output produced with indifference to truth rather than simple ignorance or random hallucination.1 The key point is not that models become stupid. It is that, under some incentives, their outward claims stop tracking what they appear to know. ...

July 11, 2025 · 17 min · Zelina

DeepSeek-R1

An open-source reasoning model achieving state-of-the-art performance in math, code, and logic tasks.

2 min