Cover image

The Model Is Not the Medical System

TL;DR for operators Health AI does not fail only because the model is weak. It fails because the model learned the wrong context, explained the wrong thing, protected the wrong boundary, retrieved the wrong evidence, or performed beautifully in the one language where the evaluation happened to be convenient. Two recent arXiv papers make that point from opposite ends of the same operational chain. One builds an explainable, privacy-aware framework for detecting career-related depression and anxiety among university students, using structured student data, facial-behavior features, multimodal fusion, label smoothing, federated learning, and attribution methods.1 The other builds MMed-Bench-IR, a multilingual medical information retrieval benchmark designed to test cross-lingual medical alignment, concept discrimination, and evidence retrieval across six languages and three tasks.2 ...

June 27, 2026 · 17 min · Zelina
Cover image

Same Meaning, Different Machine

TL;DR for operators AI systems do not merely fail by giving the wrong answer. They also fail by changing the kind of action they take when the meaning has not changed, or by spreading an update into places where it was never supposed to go. That is the shared lesson from two recent papers that, at first glance, live in different neighborhoods. One studies code-mixed hate moderation and shows that clean-English-tuned workflows can route the same underlying content differently when it appears as Tamil-English code-mix.1 The other studies multimodal knowledge editing and proposes a method for updating model knowledge so corrections generalize to related queries without disturbing visually or semantically nearby but unrelated facts.2 ...

June 24, 2026 · 19 min · Zelina
Cover image

The Model Spoke Your Language. Its Reasoning Did Not.

TL;DR for operators AdaMame is a paper about a very practical failure: a model can answer a user in one language while doing its reasoning in another. That is not just inelegant. It is a product, trust, and governance problem wearing a linguistics hat.1 The paper’s useful move is to stop treating multilingual reasoning as a translation issue. The authors train for language fidelity directly. First, they supervised fine-tune models on 30,000 naturally occurring reasoning traces across five languages. Then they run reinforcement learning with AdaMame-GRPO, a GRPO variant that gives extra reward when a correct rollout reasons in the query language. The extra reward grows during training, so the model first explores useful reasoning languages and later converges toward the user’s language. ...

June 23, 2026 · 19 min · Zelina
Cover image

Local Fluency Is Not Local Fairness: IndoBias and the Indonesian Bias Problem

TL;DR for operators IndoBias is a useful paper because it attacks a lazy assumption: that a model becomes fairer in a country once it becomes more fluent in that country’s language. Charming idea. Unfortunately, culture is not a plugin. The paper introduces a two-track benchmark for bias in Indonesian and three local languages: Javanese, Sundanese, and Makasar. The first track, IndoBias-Pairs, uses 544 contrastive stereotype pairs per language to test whether a model assigns higher likelihood to prototypical statements than to counter-stereotypical ones. The second track, IndoBias-QA, uses generation-based prompts across 336 demographic groups to examine stereotype polarity at broader coverage, including groups that may not have widely agreed stereotype pairs. ...

June 19, 2026 · 20 min · Zelina
Cover image

If Logic, Then Trouble: Why LLMs Still Miss Human Conditionals

Contract. A supplier writes, “If payment is received by Friday, the discount applies.” Most business readers do not treat this as a detached logic puzzle. They hear a practical rule: pay by Friday, get the discount; miss Friday, probably no discount. The phrase carries intent, relevance, and a small but important threat wrapped in polite operational language. ...

May 31, 2026 · 17 min · Zelina
Cover image

The Tower of Babble Gets a Router

Opening — Why this matters now Enterprise AI has a language problem. Not a charming one, like mispronouncing a French menu item with confidence. A structural one. Most companies do not operate in one clean English-speaking universe. Customer support conversations arrive in English, Tagalog, Spanish, Arabic, Thai, Vietnamese, Hindi, Indonesian, Turkish, and whatever dialectal mixture the internet felt like producing that morning. Compliance teams need summaries that preserve local meaning. E-commerce platforms need product search that understands regional idioms. Banks need customer explanations that do not flatten culture into machine-translated oatmeal. ...

May 1, 2026 · 16 min · Zelina
Cover image

Protocol Over Prompts: When Structure Becomes Strategy in AI Communication

Prompts are now office furniture. Everyone has them. Everyone complains about them. Nobody is quite sure who owns the standard version. One team keeps a Notion page of “best prompts.” Another hides theirs in a spreadsheet. A third tells new staff to “just ask clearly,” which is not a method, but it does have the administrative elegance of doing nothing. ...

April 1, 2026 · 16 min · Zelina
Cover image

Lost in Translation (Literally): Why ASR Still Breaks in the Age of Voice Agents

Voice is supposed to be the easy interface. No menus. No forms. No training session. A user speaks, the agent understands, and some neat piece of software magic happens in the background. That is the sales pitch. It is also mostly true in a demo room, which is a place where microphones behave, users speak politely, and nobody’s child interrupts from the back seat. ...

March 27, 2026 · 15 min · Zelina
Cover image

Zero Hallucination, Zero Trust? The Strange Economics of Citation-Grounded LLMs

A receipt is useful because it tells you what was bought, where, and when. It does not prove the product was good. It does not prove the cashier understood economics. It certainly does not prove the shop was honest. Citations in enterprise AI have a similar problem. A support chatbot that says “according to [1]” looks more trustworthy than one that simply improvises. A compliance assistant that appends source markers feels less reckless than one that delivers uncited confidence. A multilingual knowledge assistant that can cite sources in English and Hindi looks like a serious operational system rather than a demo with subtitles. ...

March 22, 2026 · 17 min · Zelina
Cover image

Lost in Translation: When Safety Contracts Collapse Across 2.1 Billion Voices

A chatbot walks into a multilingual market Imagine a bank, hospital, telecom platform, or public-service chatbot being rolled out across South Asia. The model has passed English safety tests. It refuses harmful requests in structured evaluation. Its vendor dashboard looks reassuring. The compliance team exhales. Then users arrive. They do not all write in English. They do not all use one script. They mix Hindi and English, write Urdu in Latin letters, switch between native script and romanization, and ask ordinary questions wrapped in messy instructions. In other words, they behave like real users, which is always inconvenient for benchmark design. ...

February 21, 2026 · 14 min · Zelina