Cover image

MoE Money, MoE Problems: Expert Capacity Finally Gets a Manager

TL;DR for operators Mixture-of-Experts models are supposed to give businesses the best of both worlds: lots of parameters for capability, few active parameters for cost. Lovely on the slide. Messier in the server room. Two recent papers make the same larger point from opposite sides of the MoE machinery. SoftMoE attacks the compute-allocation problem: why should every token, in every layer, use the same fixed number of experts just because the architecture designer had to choose a value for top-$k$?1 Tied Expert Layers attacks the memory problem: why should every layer store its own expert FFNs when many of those expert weights may be redundant across nearby layers?2 ...

June 22, 2026 · 15 min · Zelina
Cover image

MoE Than a Cost Trick: How Sparse Experts Became an Architecture Stack

The old business pitch for Mixture-of-Experts was satisfyingly simple: activate fewer parameters, spend less compute, keep more capacity on the shelf. It sounded like cloud cost optimization with a PhD. Useful, but not exactly poetic. The newer story is more interesting. Three recent arXiv papers—DOT-MoE, DAG-MoE, and LoopMoE—suggest that MoE is no longer just a sparsity trick. It is becoming an architecture stack for conditional computation: first decide how experts are formed, then how selected experts interact, and finally how sparse expert systems can be reused over iterative depth.123 ...

June 7, 2026 · 13 min · Zelina
Cover image

Pocket Experts: MobileMoE and the Memory Math of On-Device AI

Phones have memory. They also have batteries, thermal limits, app sandboxes, operating-system overhead, impatient users, and the charming habit of becoming hand warmers when developers pretend they are cloud GPUs with a smaller logo. That is the business problem behind MobileMoE, a paper that studies whether Mixture-of-Experts language models can work in the sub-billion-active-parameter regime for on-device deployment.1 The usual MoE story belongs to giant models: add many experts, activate a few, keep per-token compute low, and let the cloud hardware worry about the rest. MobileMoE asks a less fashionable but more commercially useful question: can the same sparse principle survive inside the memory and latency budget of a smartphone? ...

June 6, 2026 · 14 min · Zelina
Cover image

Expert Witness: How MoE Translation Models Can Lose Weight Without Losing the Plot

Translation is one of those AI workloads where scale is both a blessing and a tax. A large language model can translate with impressive robustness, follow instructions, preserve formatting, and handle messy inputs better than many older systems. Then the bill arrives. The model is not only carrying translation ability; it is also carrying mathematical reasoning, factual memory, coding patterns, roleplay habits, tool-use affordances, and several other things that are not exactly required to turn German into English. ...

June 4, 2026 · 17 min · Zelina
Cover image

Pooling Resources: UniPool and the MoE Budget Nobody Wanted to Audit

Opening — Why this matters now AI infrastructure has entered its spreadsheet era. Not the glamorous spreadsheet, where revenue projections grow diagonally upward and nobody asks where the assumptions came from. The other spreadsheet: the one where compute cost, memory footprint, inference latency, training instability, and model quality all insist on appearing in the same row. ...

May 9, 2026 · 16 min · Zelina
Cover image

Place Your Experts, Not Your Bets

Opening — Why this matters now The fashionable version of AI strategy still sounds suspiciously like a gym membership pitch: bigger model, more parameters, more GPUs, more everything. The operational version is less glamorous and much more important: where does the computation happen, which parts of the model are actually used, how predictable is demand, and whether the system can turn those facts into lower latency, lower cost, or better decisions. ...

May 7, 2026 · 13 min · Zelina
Cover image

The Tower of Babble Gets a Router

Opening — Why this matters now Enterprise AI has a language problem. Not a charming one, like mispronouncing a French menu item with confidence. A structural one. Most companies do not operate in one clean English-speaking universe. Customer support conversations arrive in English, Tagalog, Spanish, Arabic, Thai, Vietnamese, Hindi, Indonesian, Turkish, and whatever dialectal mixture the internet felt like producing that morning. Compliance teams need summaries that preserve local meaning. E-commerce platforms need product search that understands regional idioms. Banks need customer explanations that do not flatten culture into machine-translated oatmeal. ...

May 1, 2026 · 16 min · Zelina
Cover image

When Prompts Hire Specialists: Why pMoE Changes Visual Adaptation Economics

Inspection cameras, pathology scanners, product catalog systems, and retail shelf analytics all create the same inconvenient problem: the image may look simple, but the knowledge needed to interpret it rarely comes from one source. A model trained on broad natural images may recognize general objects well. A contrastive model may separate fine visual categories better. A medical encoder may notice domain-specific patterns that a general model treats as visual noise. A segmentation-oriented model may understand spatial boundaries better than a classifier. Asking one backbone to cover all of this is elegant in a slide deck and occasionally foolish in production. Nature, sadly, did not optimize itself for clean model procurement. ...

March 1, 2026 · 16 min · Zelina
Cover image

Don’t Forget How to Feel: Teaching Motion Models Empathy Without Amnesia

Avatars are easy to make expressive once. That is the boring version of the problem. Give a motion model enough examples of sad walking, angry gesturing, or excited dancing, and it can learn the broad association between text and motion. The harder problem starts later, after the product has already shipped. A game studio adds a new combat animation pack. A VR training company expands from office scenarios to emergency response. A digital-human platform moves from daily-life gestures into sports, performance, musical instruments, and acrobatics. Suddenly “sad” is no longer just a lowered head during walking. It must become a lowered head while jogging, a constrained body during performance, or a professional movement pattern inside a sport. ...

December 23, 2025 · 15 min · Zelina
Cover image

MoE Money, MoE Problems? FinCast Bets Big on Foundation Models for Markets

TL;DR for operators FinCast is a finance-specific time-series foundation model that tries to do for market forecasting what large pretrained models did for language: absorb enough diverse data that new tasks require less bespoke engineering.1 The paper reports strong evidence on forecasting accuracy. In a zero-shot benchmark of 3,632 financial time series and more than 4.38 million scalar time points, FinCast beats general-purpose time-series foundation models on average, with roughly 20% lower MSE and 10% lower MAE. In supervised stock benchmarks, even the zero-shot version beats the listed supervised baselines; lightweight fine-tuning improves the gap further. ...

August 30, 2025 · 16 min · Zelina