Cover image

LoRA, But Make It Legible: How CARLoS Turns Chaos into Retrieval Signal

LoRA marketplaces have a familiar business problem hiding inside an unfamiliar technical wrapper: the shelf labels are terrible. A creator uploads an adapter with a catchy name, a handful of sample images, maybe a description, maybe not. A user searches for “vibrant colors,” “pencil sketch,” “cyberpunk lighting,” or “kimono inspired.” The platform returns whatever its text search thinks is nearby. Sometimes that works. Often it does the digital equivalent of recommending a “Coloring Book” LoRA when the user wanted a graphite sketch. Charming, in the same way a vending machine full of unlabeled cans is charming. ...

December 10, 2025 · 17 min · Zelina
Cover image

No Prompt Left Behind: How Shopee’s CompassMax Reinvents RL for Giant MoE Models

Rollouts are expensive little creatures. They consume GPU time, produce long reasoning traces, wait for reward computation, and then—if the reward signal is flat—contribute exactly nothing to learning. The GPU was busy. The training dashboard looked serious. The model learned no usable distinction. Very productive, in the same way a meeting with twelve people and no decision is productive. ...

December 9, 2025 · 18 min · Zelina
Cover image

Noise Without Borders: How Single-Pair Guidance Rewrites Diffusion Synthesis

Camera noise is annoying in the same way logistics is annoying: nobody wants to talk about it until the system fails. A phone camera, a factory inspection camera, a medical imaging sensor, or a night-time security device does not merely capture a clean scene plus a cute little sprinkle of Gaussian noise. Real image noise is shaped by sensors, ISO settings, shutter speed, color processing, demosaicing, compression, and whatever private magic lives inside the image signal processing pipeline. In research papers, that pipeline is often politely summarized as “real-world noise.” In deployment, it is the reason a denoising model that looked excellent in the lab starts behaving like it has never seen darkness before. ...

December 7, 2025 · 15 min · Zelina
Cover image

Pruned but Not Muted: How Frequency-Aware Token Reduction Saves Vision Transformers

Images are expensive. Not emotionally, although some product managers do try. They are expensive because modern visual models turn an image into a sequence of tokens, then let those tokens attend to one another. In a Vision Transformer, more tokens usually mean more detail, but also more attention cost. The obvious response is to reduce the number of tokens. ...

November 29, 2025 · 16 min · Zelina
Cover image

One-Shot, No Drama: Why Training-Free Federated VLMs Might Actually Work

Deployment is where elegant AI systems go to discover invoices, weak networks, compliance teams, and client devices with the computing dignity of a hotel lobby printer. Federated vision–language models make that problem worse. In theory, they are attractive: keep local data local, let many clients collaborate, and adapt a powerful pre-trained model to distributed visual tasks. In practice, the standard recipe usually asks every client to participate in repeated training rounds, exchange updates, survive connectivity gaps, and somehow not turn the entire project into a GPU-themed charity event. ...

November 23, 2025 · 16 min · Zelina
Cover image

One Pass to Rule Them All: YOFO and the Rise of Compositional Judging

Search is where nuance goes to die. A customer asks for a long evening dress, preferably not pink. A retrieval model sees “dress,” “evening,” perhaps “pink,” and returns something short, bright, and entirely wrong with the confidence of a clerk who has technically read the sentence but not understood the assignment. The business consequence is familiar: fewer conversions, more irrelevant recommendations, and yet another dashboard where “semantic relevance” looks respectable while customers quietly leave. ...

November 22, 2025 · 17 min · Zelina
Cover image

RL, Recall, and the Rise of Agentic Memory: What Memory-R1 Means for AI Systems

A customer-support agent that remembers the wrong thing is often worse than one that remembers nothing. Nothing can be checked. Wrong memory arrives wearing the little hat of confidence. This is the uncomfortable problem behind long-term AI agents. Businesses want systems that remember customer preferences, project history, unresolved tickets, contractual context, previous exceptions, and the fact that the user did not, in fact, ask to restart the whole workflow from scratch. The usual engineering answer is to bolt on memory: save notes, retrieve similar snippets, stuff them into context, and hope the model behaves like a diligent assistant rather than a distracted intern with a filing cabinet. ...

November 21, 2025 · 15 min · Zelina
Cover image

Heads Up: Why Sensitivity Matters in Many‑Shot Multimodal ICL

Long prompts are easy to understand. They are also expensive, slow, and—in multimodal systems—very quickly ridiculous. That is the practical tension behind many-shot multimodal in-context learning. In principle, giving a vision-language model more examples should help it recognise the task. In practice, every image costs tokens, every additional demonstration adds latency, and open-source large multimodal models do not generally enjoy infinite context windows. The business version of the problem is familiar: you want a model to adapt to a specialised workflow, but you do not want to fine-tune it every week, pay for swollen prompts forever, or discover that the “cheap” approach now requires a larger GPU. ...

November 15, 2025 · 15 min · Zelina
Cover image

From DAGs to Swarms: The Quiet Revolution of Agentic Workflows

Queue. That is still the hidden operating model of much modern science. Queue for the instrument. Queue for the simulation. Queue for the data transfer. Queue for a human to inspect the result, change the parameters, approve the next run, and remind three systems with incompatible interfaces that they are supposed to be part of the same experiment. The glamour version is “AI for discovery.” The operational version is a researcher quietly becoming a logistics coordinator with a PhD. ...

September 19, 2025 · 17 min · Zelina
Cover image

Rollouts, Not GPUs: Why AWorld’s 14.6× Speedup Rewires Agent Training

TL;DR for operators AWorld’s useful lesson is not “buy more GPUs”. It is more specific, and therefore more operationally annoying: if an agent learns from interaction, the bottleneck becomes the rate at which it can safely attempt tasks, collect trajectories, score outcomes, and feed those traces back into training. The paper shows three things that matter for builders. First, more rollouts per task sharply raise success rates on GAIA validation: Claude 3.7 Sonnet rises from 47.9% pass@1 to a 76.4% peak, while GPT-4o rises from 27.3% to 65.5% as rollout count increases to 32. Second, AWorld’s distributed executor cuts rollout time for one training cycle from 7,695 seconds to 525 seconds, while training time stays fixed at 144 seconds. That is the paper’s 14.6× speedup, and it is the result that makes the training loop economically less ridiculous. Third, using that loop, Qwen3-32B-AWorld reaches 32.23% GAIA test pass@1, up from 21.59% for the base Qwen3-32B model, and improves xbench-DeepSearch from 12% to 32% without direct training on that benchmark. ...

August 31, 2025 · 15 min · Zelina