Cover image

From Prompts to Proofs: When Language Becomes an SMT Theory

Policy is where language stops being poetry and starts becoming liability. A content moderation policy, a warranty clause, a procurement rule, a safety instruction, a legal test: all of them look like ordinary prose until someone asks the system to apply them consistently. Then the prose turns into a machine with hidden gears. Some gears are logical: this condition and that condition, this exception unless that threshold is met. Other gears are semantic: whether a message is threatening, whether a disclosure is meaningful, whether a clause covers a warranty period. Humans navigate this mixture badly but socially. LLMs navigate it fluently but not always reliably. Solvers navigate it reliably but only after the world has been turned into formal symbols. Which is, inconveniently, not how most business documents arrive. ...

February 23, 2026 · 17 min · Zelina
Cover image

Drafts, Then Do Better: Teaching LLMs to Outgrow Their Own Reasoning

Most office work has a draft problem. A junior analyst writes a first version of a financial memo. A lawyer marks up an argument. A consultant turns messy meeting notes into a client-ready recommendation. The first attempt is rarely useless. It is usually half-right, locally clever, and globally flawed. The expensive part is not starting from zero. The expensive part is learning how to improve a decent draft without being hypnotized by it. ...

February 10, 2026 · 16 min · Zelina
Cover image

Algorithmic Context Is the New Heuristic

Warehouse. That is a better place to start than “large language models for combinatorial optimization,” because the business problem is not philosophical. A warehouse has stacks, access directions, priorities, robots, blocked items, and deadlines. Someone has to decide which unit load moves first, which move creates future trouble, and how to search through the possible rearrangements without melting the compute budget. ...

February 2, 2026 · 15 min · Zelina
Cover image

Greedy, but Not Blind: Teaching Optimization to Listen

Budget meetings have a familiar rhythm. Someone brings the spreadsheet. Someone brings the map. Someone else brings the sentence that ruins the spreadsheet: “This district looks inefficient on paper, but the roads are worse than the data says.” Classical optimization knows what to do with numbers. It does not naturally know what to do with that sentence. In public health planning, infrastructure rollout, retail site selection, and ESG investment, those sentences are often where the real institutional knowledge lives. Unfortunately, once the sentence enters the room, the algorithm usually leaves through the back door. Or worse, the organization pretends the sentence has been “encoded” into a weight, because apparently all human judgment becomes rigorous once it is multiplied by 0.37. ...

January 19, 2026 · 14 min · Zelina
Cover image

Explaining the Explainers: Why Faithful XAI for LLMs Finally Needs a Benchmark

Hiring. A candidate writes a personal statement. A screening model gives a score. A manager asks the AI system why. The explanation says work experience mattered most, education came next, and demographic variables barely moved the decision. Everyone relaxes, because the explanation sounds reasonable. That is the dangerous part. A reasonable explanation is not necessarily a faithful explanation. A counterfactual edit that looks plausible is not necessarily a causal counterfactual. And a model that appears insensitive to demographic concepts may not be “fair”; it may simply have learned, or been aligned, to suppress visible sensitivity in the narrow setting being tested. ...

January 17, 2026 · 15 min · Zelina
Cover image

When LLMs Stop Talking and Start Driving

Factory trouble usually begins in language. Not elegant language. Not the polished language of annual reports and transformation roadmaps. The useful trouble is buried in work orders, technician notes, supplier messages, inspection records, customer complaints, meeting minutes, and logs written by people who had better things to do than produce clean training data. ...

January 11, 2026 · 18 min · Zelina
Cover image

Model Cannibalism: When LLMs Learn From Their Own Echo

Feedback is usually sold as the civilized part of AI deployment. Users interact with the model. The product team collects prompts, outputs, ratings, usage logs, corrections, maybe a few thumbs-up signals. The model is fine-tuned. The next version is better. Everybody nods. A dashboard is opened. Someone says “continuous improvement.” The room relaxes. ...

January 9, 2026 · 19 min · Zelina
Cover image

When Three Examples Beat a Thousand GPUs

A GPU bill is usually treated as a hardware problem. Buy faster accelerators, shorten training runs, negotiate a better cloud contract. Less often asked is whether the expensive part of the pipeline began with a badly calibrated prompt. An LLM generating neural-network architectures can create thousands of candidates before training begins. If the prompt provides too little context, the model may repeatedly produce shallow variations of the same familiar design. Add more examples, and it may combine useful ideas across architectural families. Add still more, and the output can become worse, incomplete, or invalid. ...

January 3, 2026 · 15 min · Zelina
Cover image

Deployed, Retrained, Repeated: When LLMs Learn From Being Used

Acceptance is a reward, even when nobody writes reward = 1. Imagine an enterprise deploys an AI agent to generate code, reconcile invoices, or prepare operational plans. Some outputs pass automated checks and enter production. Others fail, disappear into logs, and are never seen again. Months later, the accepted outputs are collected and used to fine-tune the next model. ...

January 1, 2026 · 18 min · Zelina
Cover image

Forgetting That Never Happened: The Shallow Alignment Trap

Forgetting That Never Happened: The Shallow Alignment Trap Forgetfulness is an expensive diagnosis. When an internal AI system performs well on last month’s support taxonomy, then underperforms after being fine-tuned on this month’s compliance cases, the obvious story is simple: the model forgot. That story usually triggers an equally obvious response: replay old data, retrain more broadly, freeze more parameters, or panic politely in a meeting while calling it “model lifecycle management.” ...

December 27, 2025 · 17 min · Zelina