Cover image

Prompt Wars: When Pedagogy Beats Cleverness

A prompt review meeting usually sounds more scientific than it is. One person likes the “coach” version. Another prefers the “Socratic” version because it sounds more educational. Someone says the prompt should mention metacognition. Someone else adds “be concise,” because apparently every prompt eventually becomes a corporate email with anxiety issues. Then the team ships the one that feels best. ...

January 23, 2026 · 15 min · Zelina
Cover image

When Models Read Too Much: Context Windows, Capacity, and the Illusion of Infinite Attention

The demo is familiar now. Someone drops a whole contract, a whole policy manual, a whole code repository, or a month of chat history into a model and asks one neat question. The model answers fluently. The room relaxes. The slide says “1M-token context.” Procurement starts smiling. This is where the trouble begins. ...

January 18, 2026 · 14 min · Zelina
Cover image

When Prompts Learn Themselves: The Death of Task Cues

A database column named CURRENT_BAL_AMT is annoying. A column named gbstk is worse. Somewhere inside an enterprise data warehouse, these names are perfectly normal. Somewhere outside the original engineering team, they are tiny locked doors. The usual solution is not glamorous. Someone asks a data engineer. The data engineer asks an older data engineer. A wiki page is found, partly wrong, last updated during an earlier economic cycle. Eventually, “current balance amount” or “overall processing status of sales document” appears in a data catalog, a semantic layer, a search index, or a text-to-SQL system. Humanity advances by one abbreviation. ...

January 7, 2026 · 17 min · Zelina
Cover image

Think Wide, Then Think Hard: Forcing LLMs to Be Creative (On Purpose)

Imagine a brainstorming meeting in which every new idea must immediately pass legal review, fit the quarterly budget, use the existing technology stack, satisfy six executives, and arrive formatted as a PowerPoint slide. The meeting will probably produce something feasible. It will also produce the same three ideas everyone proposed last quarter. ...

December 30, 2025 · 15 min · Zelina
Cover image

When Small Models Learn From Their Mistakes: Arithmetic Reasoning Without Fine-Tuning

Numbers are where language models usually stop sounding impressive. Ask a model to summarize a financial report and it may produce a fluent paragraph with just enough confidence to make everyone in the meeting relax. Ask it to calculate a percentage change from a table, preserve the correct scale, and return a verifiable number, and the poetry ends. Suddenly the model must select the right values, understand the wording, apply the right operation, avoid sign mistakes, avoid scale mistakes, and not hallucinate a formula because the word “change” appeared nearby. ...

December 16, 2025 · 18 min · Zelina
Cover image

When Agents Loop: Geometry, Drift, and the Hidden Physics of LLM Behavior

Agents are rarely dangerous because they answer once. They become interesting, and occasionally annoying, when they loop. A customer-support agent drafts a reply, critiques it, revises it, checks policy, rewrites the tone, and sends the result back into another reasoning step. A research agent summarizes papers, updates its plan, searches again, and revises its own assumptions. A coding agent edits a file, reads the error, patches the patch, and keeps going until either the tests pass or the repository looks like an archaeological site. ...

December 14, 2025 · 17 min · Zelina
Cover image

You Know It When You See It—But Can the Model?

Review queue. Someone has to decide whether an image is “unsafe,” “misleading,” “healthy,” “premium,” “clickbait,” “brand-safe,” or “not really our vibe.” The label sounds simple until the first borderline case appears. A salad with too much cream. A gaming ad that hints at easy money but never quite says it. A before-and-after photo where the “achievement” is visible only if one is feeling generous. ...

December 12, 2025 · 15 min · Zelina
Cover image

Anchors Aweigh? Why Small LLMs Refuse to Flip Their Own Semantics

A label looks harmless until you ask it to lie. Tell a model that a glowing movie review should be labeled POS, and few-shot prompting behaves like a useful intern: it studies the examples, picks up the pattern, and usually gets better. Tell the same model that a glowing review should now be labeled NEG, and the intern becomes less useful. It does not smoothly learn your private code. It does not politely invert its semantic universe. It mostly produces a muddle. ...

November 30, 2025 · 15 min · Zelina
Cover image

Tile by Tile: Why LLMs Still Can't Plan Their Way Out of a 3×3 Box

A board game should not embarrass a frontier model. That is the uncomfortable charm of the 8-puzzle. It has no hidden information, no vague user intent, no messy database schema, no ambiguous policy exception, and no client saying “just make it pop.” It is a 3×3 grid with eight tiles and one blank space. Slide adjacent tiles into the blank. Reach the goal state. Done. ...

November 27, 2025 · 15 min · Zelina
Cover image

Prompted and Confused: When LLMs Forget the Assignment

A requirements document walks into a model. It says: assign resources, respect capacity, avoid conflicts, minimise waste. The model nods politely, emits a tidy block of MiniZinc, and everyone is briefly tempted to believe the future has arrived. Then someone changes the story from cars to knapsacks, or adds one stray sentence about maximising something, and the same system quietly forgets the assignment. ...

November 20, 2025 · 14 min · Zelina