Cover image

Zero Degrees, Still Feverish: Why Deterministic AI Needs a Thermometer

Opening — Why this matters now The comforting myth of enterprise AI is that setting an LLM’s temperature to zero makes it deterministic. A nice little checkbox. A procedural sedative. Press it, and the machine behaves. The paper Introducing Background Temperature to Characterise Hidden Randomness in Large Language Models is useful because it attacks that myth directly. Its central claim is not that LLMs are chaotic by nature. That would be dramatic, and therefore probably a conference keynote. The claim is sharper: even when a model is asked to decode at $T = 0$, the surrounding inference environment can introduce enough tiny numerical variation to produce divergent outputs.1 ...

April 29, 2026 · 11 min · Zelina
Cover image

Compress, Then Confess: Why Order Beats Method in AI Model Efficiency

A deployment team has a large model, a smaller device, and a familiar problem: the model is too heavy for the place where the business actually wants to use it. So the team reaches for the standard efficiency drawer. Prune some weights. Quantize the remaining values. Maybe add a light adapter to recover accuracy. Push the result to edge hardware, a mobile app, or a cheaper inference server. Then explain to management why the model became faster but also slightly less intelligent. The usual ritual. ...

March 21, 2026 · 20 min · Zelina
Cover image

Break-Even the Machine: Strategic Thinking in the Age of High-Cost AI

TL;DR for operators The real AI cost question is not “Which model is cheapest?” It is “Which workflow delivers acceptable outcomes at the lowest verified total cost?” Token price is only the most visible line item. The less photogenic costs are retries, review, integration, monitoring, compliance, vendor lock-in, and the small corporate tragedy known as “we saved money on inference and spent it all on fixing nonsense.” ...

March 27, 2025 · 13 min · Zelina
Cover image

Blind Trust, Fragile Brains: Why LoRA and Prompts Need a Confidence-Aware Backbone

TL;DR for operators LoRA and prompts are attractive because they make model adaptation feel almost too easy: add a few examples, attach a small adapter, nudge the model into a domain, and call it customised. The uncomfortable part is that adaptation changes not only what a model says, but how confidently it says it. A compliance assistant that becomes slightly more domain-specific but far more overconfident has not been improved. It has been promoted beyond its competence, a classic corporate move. ...

March 25, 2025 · 14 min · Zelina