Cover image

When Fine-Tuning Bites Back: The Hidden Safety Drift in Vision-Language Agents

Customization sounds harmless. A company takes a capable vision-language model, adds a lightweight adapter, fine-tunes it on a narrow internal dataset, and calls the result “domain-specialized.” The dashboard still has green boxes. boxes. The model still answers normal text questions. The update is cheap, fast, and reversible in theory. Everyone goes home with the comfortable feeling that parameter-efficient fine-tuning is basically a productivity tool with a nerdy name. ...

February 21, 2026 · 17 min · Zelina
Cover image

Merge Without a Mess: Adaptive Model Fusion in the Age of LLM Sprawl

Models pile up quietly. A customer-support model here. A finance QA model there. A legal drafting variant that nobody wants to delete because it passed last quarter’s evaluation. A sales assistant fine-tuned on a dataset that may or may not still represent how the company sells. Then come LoRA adapters, instruction-tuned checkpoints, safety-tuned variants, regional versions, and a few “temporary” experiments that become permanent because nobody enjoys breaking production on a Friday. ...

February 14, 2026 · 13 min · Zelina
Cover image

When AI Forgets on Purpose: Why Memorization Is the Real Bottleneck

Fine-tuning is supposed to be the polite part of AI customization. A company uploads domain data. A provider adapts an aligned model. The final model still refuses harmful requests, still answers useful questions, and ideally becomes more competent at the client’s narrow task. Everyone nods. The demo works. The governance slide says “safety preserved.” The slide, as usual, is doing a lot of unpaid labor. ...

February 7, 2026 · 15 min · Zelina
Cover image

Identity Crisis: How a Trivial Trick Teaches LLMs to Think Backwards

Facts are rude. They rarely arrive in the direction your software needs them. A customer database may know that Alice reports to Bob, while the compliance officer asks, “Who reports to Bob?” A product catalog may store that SKU-17 belongs to Category X, while the chatbot receives, “Show me all products in Category X.” A medical knowledge base may encode one directional relation, while the user asks for the inverse. Humans treat these as the same fact seen from opposite ends. Language models, being very expensive autocomplete machines with a talent for plausible theater, do not always share our confidence. ...

February 3, 2026 · 18 min · Zelina
Cover image

Small Models, Big Mouths: Why Game AI Doesn’t Need Giant Brains

Game AI has a very ordinary problem: it has to work while the player is waiting. Not eventually. Not after a cloud round trip. Not after an impressive model has finished contemplating the metaphysics of medieval tavern gossip. In a game, intelligence has to fit inside latency budgets, memory budgets, design constraints, and the deeply unromantic fact that many players expect single-player games to work offline. ...

February 3, 2026 · 17 min · Zelina
Cover image

Model Cannibalism: When LLMs Learn From Their Own Echo

Feedback is usually sold as the civilized part of AI deployment. Users interact with the model. The product team collects prompts, outputs, ratings, usage logs, corrections, maybe a few thumbs-up signals. The model is fine-tuned. The next version is better. Everybody nods. A dashboard is opened. Someone says “continuous improvement.” The room relaxes. ...

January 9, 2026 · 19 min · Zelina
Cover image

Forgetting That Never Happened: The Shallow Alignment Trap

Forgetting That Never Happened: The Shallow Alignment Trap Forgetfulness is an expensive diagnosis. When an internal AI system performs well on last month’s support taxonomy, then underperforms after being fine-tuned on this month’s compliance cases, the obvious story is simple: the model forgot. That story usually triggers an equally obvious response: replay old data, retrain more broadly, freeze more parameters, or panic politely in a meeting while calling it “model lifecycle management.” ...

December 27, 2025 · 17 min · Zelina
Cover image

Timeline Triage: How LLMs Learn to Read Between Clinical Lines

Hospital notes are not databases that forgot to wear a spreadsheet costume. They are fragments of care: treatment names, planned cycles, delayed doses, discontinued regimens, relative dates, typos, abbreviations, and the occasional phrase that looks obvious until two clinicians disagree about what it actually means. For oncology, that mess matters. A chemotherapy timeline is not just a historical summary; it is the skeleton of a patient’s treatment journey. Get the timeline wrong, and downstream systems may misunderstand what was given, when it started, when it ended, and whether a patient fits a registry, audit, research cohort, or trial-matching rule. ...

December 7, 2025 · 16 min · Zelina
Cover image

Fine-Tuning Without Fine-Tuning: How Fints Reinvents Personalization at Inference Time

Memory is a useful product feature until it becomes a junk drawer. That is the quiet problem behind many “personalized” AI systems. A user has a history. The system retrieves some of it. The model receives a longer prompt. The output becomes, in theory, more personal. In practice, the assistant often behaves like someone who read your old emails in a hurry and decided this was the same as knowing you. ...

November 5, 2025 · 16 min · Zelina
Cover image

The Esperanto of AI Agents: How the Agent Data Protocol Unifies a Fragmented Ecosystem

Every engineering team has met this problem: the useful data exists, but it lives in thirteen different shapes, three different tool conventions, two incompatible logs, and one heroic spreadsheet that nobody dares to open. AI agents have the same disease, only with more acronyms. The paper behind the Agent Data Protocol, or ADP, argues that large-scale supervised fine-tuning of AI agents has been held back less by a lack of data than by a lack of shared representation.1 Agent datasets already exist for coding, software engineering, web browsing, API use, operating-system interaction, and general tool use. The difficulty is that each one tends to encode actions, observations, tool calls, web states, messages, and execution feedback in its own local dialect. Naturally, every dataset is special. How convenient for nobody. ...

November 2, 2025 · 12 min · Zelina