Cover image

Preference Signals, Not Preference Theater

Preference Signals, Not Preference Theater Businesses are currently learning an expensive lesson: user behavior is not the same thing as user preference. A person clicks because the button was large. A driver brakes because the situation was unclear. A customer accepts a chatbot answer because the refund is small and arguing is tedious. A manager approves a workflow because the dashboard made the alternative invisible. The log file looks objective. It is also quietly contaminated by habit, uncertainty, exploration, friction, fatigue, and the occasional human desire to end the meeting before lunch. ...

June 3, 2026 · 15 min · Zelina
Cover image

Do the Math, Not the Mime: Why LLM Reasoning Needs a Verification Pipeline

A spreadsheet error rarely announces itself with dramatic music. It usually arrives politely. A pricing model gives a clean answer. A compliance calculator writes a confident explanation. A financial assistant produces a neat derivation with enough intermediate steps to look reassuring. The result is formatted, fluent, and possibly wrong. That is the uncomfortable business lesson behind Mathematical Reasoning in Large Language Models: Benchmarks, Architectures, Evaluation, and Open Challenges, a 2026 survey of roughly 120 studies on LLM mathematical reasoning.1 The paper is not introducing one new benchmark, one heroic model, or one more leaderboard trophy to place on the already overcrowded mantelpiece. Its useful contribution is more structural: it connects datasets, representations, training methods, tool use, verifiers, and evaluation metrics into one reasoning pipeline. ...

May 31, 2026 · 14 min · Zelina
Cover image

If Logic, Then Trouble: Why LLMs Still Miss Human Conditionals

Contract. A supplier writes, “If payment is received by Friday, the discount applies.” Most business readers do not treat this as a detached logic puzzle. They hear a practical rule: pay by Friday, get the discount; miss Friday, probably no discount. The phrase carries intent, relevance, and a small but important threat wrapped in polite operational language. ...

May 31, 2026 · 17 min · Zelina
Cover image

If Logic Were Enough: Why LLMs Still Miss the Point of Conditionals

A promise is rarely just a logical operator. “If you mow the lawn, I’ll give you 50 dollars” does not sound like a philosophical exercise in truth tables. It sounds like a deal. Most people hear it as: no mowing, no money. By contrast, “If you’re hungry, there’s pizza in the oven” does not mean the pizza appears only under the metaphysical condition of your hunger. It means the pizza is there, and your hunger merely explains why I am telling you. ...

May 29, 2026 · 16 min · Zelina
Cover image

Think Less, Align Better: The New Economics of AI Reasoning

Opening — Why this matters now Enterprise AI is entering its mildly awkward teenage phase: everyone wants intelligence, nobody wants the invoice. For the last two years, much of the AI conversation has revolved around more: more context, more reasoning tokens, more chain-of-thought, more human feedback, more evaluators, more synthetic data, more agents, more dashboards to explain why the agents broke the dashboards. The operating assumption was simple enough: if the model thinks more, explains more, or trains on more feedback, it should perform better. ...

May 9, 2026 · 19 min · Zelina
Cover image

Mind the Reward Gap: Why Business AI Needs More Than Pretty Answers

Opening — Why this matters now Business AI has entered its awkward teenage years. The first phase was easy to admire: models could draft, summarize, classify, recommend, and explain. Then companies started asking the rude adult questions: Can we trust the answer? Did it make the right trade-off? Can it improve from outcomes? What happens when the reward signal is wrong? ...

May 2, 2026 · 17 min · Zelina
Cover image

Reasonable Doubts: Why AI Reasoning Is Not a Solo Act

Opening — Why this matters now AI reasoning has become the software industry’s favorite magic word. Every product now claims to “reason,” usually after adding a longer prompt, a larger model, and a pricing page with the emotional warmth of a hospital bill. But three recent arXiv papers point to a more useful conclusion: reasoning is not a single capability that lives inside one heroic model. It is becoming a system architecture. ...

May 2, 2026 · 16 min · Zelina
Cover image

Model Citizens: Why Agentic AI Needs Laws, Not Just Loops

Opening — Why this matters now The current agentic AI conversation has a charmingly reckless habit: attach a large language model to tools, add a planner, sprinkle in memory, and call the result an autonomous system. This is not entirely wrong. It is merely incomplete in the way a paper airplane is technically aviation. ...

April 27, 2026 · 13 min · Zelina
Cover image

From Words to Workflows: Why AI Still Struggles to Think Like an Operations Research Analyst

A warehouse manager does not ask for “a constraint optimization problem.” She asks whether tomorrow’s orders can be shipped without overtime. A university administrator does not request “a mixed-integer formulation.” He asks whether lectures can be scheduled without room conflicts. A retail planner does not want “a MiniZinc model.” She wants to know which stores should receive scarce inventory before the promotion starts. ...

April 15, 2026 · 15 min · Zelina
Cover image

CivBench: When AI Stops Guessing and Starts Planning

Scoreboards are comforting. They reduce a messy contest into one neat line: winner, loser, maybe a score. Executives like them, product teams like them, investors like them, and benchmark dashboards absolutely adore them. Strategy, unfortunately, is rude enough not to fit inside that line. A company can make the right decisions and still lose because the market turns. A trading agent can survive a bad regime by managing exposure well, then look mediocre because the final return is not spectacular. A planning system can stumble into success after making terrible intermediate choices. Outcome-only evaluation is clean, but cleanliness is not the same as truth. It is often just a good-looking loss of information. ...

April 11, 2026 · 17 min · Zelina