Cover image

LoRA and Order: The Strange Case for One Well-Placed Adapter

Opening — Why this matters now Enterprise AI is entering its less glamorous, more useful phase: not “Can we connect an LLM to everything?” but “Can we adapt it without making the GPU bill look like a small infrastructure project?” Fine-tuning still matters. Retrieval helps with knowledge access, prompt engineering helps with behavior shaping, and agent frameworks help with workflow orchestration. But many businesses eventually hit the same wall: the base model is close, yet not close enough. It needs domain style, task format, compliance habits, tool-use discipline, or workflow-specific judgment. That usually means some form of supervised fine-tuning. ...

May 9, 2026 · 15 min · Zelina
Cover image

No Free Tokens: The New Economics of LLM Inference

Opening — Why this matters now For the last few years, AI strategy has been narrated as a model-quality story: bigger models, better benchmarks, longer context windows, more agents, more demos, more adjectives. That story was useful. It was also incomplete. The less glamorous reality is now arriving with the invoice attached. LLM systems are not merely models. They are production services that consume GPU memory, scheduling capacity, engineering attention, and operational patience. Once a business moves from a prototype to repeated daily use, the question changes from “Can the model answer?” to “Can the system answer reliably, cheaply, and repeatedly when real users arrive at inconvenient times?” ...

May 7, 2026 · 16 min · Zelina
Cover image

Rank and File: BoostLoRA’s Case for Smarter Fine-Tuning

Opening — Why this matters now Enterprise AI is entering its less glamorous phase: not the demo, not the keynote, not the charming chatbot that answers three curated questions correctly, but the operational grind of making models behave reliably inside messy workflows. That grind usually runs into a familiar triangle. Full fine-tuning is powerful but expensive, operationally heavy, and often risky when the training set is narrow. Parameter-efficient fine-tuning, especially LoRA-style adaptation, is cheaper and easier to deploy, but the smallest adapters can hit a ceiling. Meanwhile, the business user does not care whether the adapter was elegant. They care whether the model stops making the same costly mistakes in invoicing, compliance review, customer support, code generation, or scientific triage. ...

May 4, 2026 · 13 min · Zelina
Cover image

Rank and File: Why LoRA Adapters May Be Bigger Than They Need to Be

Opening — Why this matters now Fine-tuning large models used to sound like a research luxury. Now it is a line item in the infrastructure budget. Enterprises do not want one general-purpose model behaving vaguely usefully for everyone. They want domain-specific behavior: a support adapter for insurance claims, a compliance adapter for legal review, a financial-document adapter for analyst workflows, perhaps a dozen regional variants, and then another dozen because someone discovered “brand tone” during a steering committee meeting. Naturally. ...

May 4, 2026 · 12 min · Zelina
Cover image

From Tadpole to Titan: How DEVFT Grows LLMs Like a Brain

If federated fine-tuning feels like trying to teach calculus to a toddler on a flip phone, you’re not alone. While the privacy-preserving benefits of federated learning are clear, its Achilles’ heel has always been the immense cost of training large models like LLaMA2-13B across resource-starved edge devices. Now, a new method—DEVFT (Developmental Federated Tuning)—offers a compelling paradigm shift, not by upgrading the devices, but by downgrading the expectations. At least, at first. ...

August 4, 2025 · 3 min · Zelina
Cover image

The LoRA Mirage: Why Lightweight Finetuning Isn't Lightweight on Privacy

When we talk about parameter-efficient fine-tuning, LoRA (Low-Rank Adaptation) is often celebrated as a silver bullet: cost-effective, memory-efficient, and—many assume—safe. After all, it modifies only a small fraction of model parameters, sideloaded as low-rank matrices, while leaving the massive pretrained model backbone untouched. The prevailing belief has been that such minimal intervention can’t possibly memorize or leak sensitive data. This belief is now decisively debunked by LoRA-Leak, a landmark framework introduced in a new paper by researchers from Tsinghua and HKUST. Their findings are a wake-up call for AI developers and policymakers alike: even LoRA-finetuned models are highly vulnerable to membership inference attacks (MIAs)—and ironically, the very presence of the frozen pretrained model amplifies this leakage risk. ...

July 25, 2025 · 4 min · Zelina