Cover image

Curiosity Under Constraint: Engineering Agency, Not Just Intelligence

A good assistant is not always the one that answers fastest. Sometimes it should ask for another file. Sometimes it should stop reading and act. Sometimes it should think privately for a few more steps. Sometimes it should say nothing, because another paragraph of “reasoning” would merely burn tokens while impressing nobody except the invoice. ...

March 2, 2026 · 16 min · Zelina
Cover image

LemmaBench: When AI Finally Meets Real Mathematics

Most AI math benchmarks still feel like exam rooms. The model receives a problem. It produces an answer. We score the answer. Everyone argues about whether the problem was hard enough, whether the model saw something similar during training, and whether the leaderboard means anything outside the leaderboard. Very productive. Almost as peaceful as a faculty meeting. ...

March 2, 2026 · 17 min · Zelina
Cover image

Beyond the Linear Ceiling: Why Non-Linearity Is the Next Frontier in PEFT

More Rank Is Not Always More Capacity Fine-tuning teams love a simple knob. If the model underperforms, increase rank. If the adapter looks too small, increase rank. If the downstream task is hard, increase rank again and call it strategy. This is comforting because rank is measurable, budgetable, and easy to explain in a meeting. Unfortunately, reality has its usual habit of being less cooperative. ...

March 1, 2026 · 16 min · Zelina
Cover image

Thoughts in Motion: From Static Prompts to Self-Optimizing Reasoning Graphs

A workflow looks harmless until it starts waiting on itself. One LLM call asks for a plan. Another evaluates the plan. A third revises the result. A fourth retrieves evidence. Somewhere in the middle, three subtasks could have run at the same time, two repeated calls could have been reused, and one prompt should probably have been tuned before anyone proudly called the system “agentic.” Instead, the whole thing runs as a neat little chain: expensive, slow, and quietly brittle. Very elegant, in the way a traffic jam is elegant if viewed from a drone. ...

February 19, 2026 · 15 min · Zelina
Cover image

It Takes Two to Think: Why AI’s Future May Be Social Before It’s Smart

Conversation is usually treated as the interface layer of AI. The user asks. The model answers. The chatbot smiles politely, perhaps too politely, and everyone pretends that a slightly longer prompt is the same thing as a better thinking system. This is convenient, measurable, and occasionally profitable. It is also probably too shallow. ...

February 17, 2026 · 16 min · Zelina
Cover image

Consistency Is Not a Coincidence: When LLM Agents Disagree With Themselves

A support ticket arrives. The agent reads the same customer history, sees the same policy document, and has access to the same tools. On Monday, it searches for the refund rule, retrieves the correct clause, and gives a clean answer. On Tuesday, with the same input, it searches for a different phrase, retrieves a less relevant document, wanders through two extra steps, and ends with a confident answer that is only approximately useful. ...

February 14, 2026 · 16 min · Zelina
Cover image

When Aligned Models Compete: Nash Equilibria as the New Alignment Layer

Attention is a strange boss. It does not simply reward the best content, the most balanced opinion, or the most socially useful answer. It rewards whatever survives the rules of the environment. That distinction matters once AI systems stop being isolated chatbots and start behaving like a population: autonomous accounts, synthetic creators, enterprise agents, customer-facing bots, negotiation assistants, research agents, and ranking-aware content machines. Each one may be aligned in the usual single-model sense. Each one may pass safety checks. Each one may avoid obvious toxicity. Then they are released into the same market for attention, engagement, approval, conversion, or influence. ...

February 9, 2026 · 16 min · Zelina
Cover image

Thinking in Panels: Why Comics Might Beat Video for Multimodal Reasoning

A dashboard screenshot is often too little. A video walkthrough is often too much. Somewhere between the two sits a strangely old-fashioned interface: panels, captions, arrows, speech bubbles, and a sequence that tells the machine what happened before what. Yes, comics. That sounds unserious only if we think comics are a decoration layer: something added after the reasoning is complete to make the output friendlier. The paper Thinking with Comics: Enhancing Multimodal Reasoning through Structured Visual Storytelling makes a more interesting claim: comics can act as the reasoning medium itself, not merely the illustration of reasoning after the fact.1 ...

February 3, 2026 · 17 min · Zelina
Cover image

Learning to Discover at Test Time: When Search Learns Back

A leaderboard usually treats an AI model like a very fast intern: give it a problem, let it try many times, keep the best answer, and politely ignore the graveyard of failed attempts. That is useful. It is also a little strange. A human engineer does not merely try 25,600 variations of a GPU kernel while keeping the same brain. After the first few failures, she learns which bottlenecks matter. After a lucky partial success, she changes how she thinks about the problem. After enough attempts, the search process is no longer just sampling. It has become learning. ...

January 24, 2026 · 18 min · Zelina
Cover image

One Agent Is a Bottleneck: When Genomics QA Finally Went Multi-Agent

One Agent Is a Bottleneck: When Genomics QA Finally Went Multi-Agent Databases are where elegant AI demos go to develop a limp. A model can sound fluent about biology, medicine, finance, or law. Then someone asks a question that requires the latest record from a specialized database, a second lookup from another source, a formatted API call, a large HTML response, and a final answer that does not forget the original question halfway through. Suddenly the “AI assistant” becomes a very expensive intern copying URLs into the wrong field. ...

January 16, 2026 · 15 min · Zelina