Fine-Tuning

The White Coat Is Not the Treatment

TL;DR for operators Belmadani et al. study a question every serious enterprise LLM team eventually meets after the prototype stops looking magical: which adaptation bill is actually worth paying?1 In French medical question answering, they compare continual pretraining (CPT), supervised fine-tuning (SFT), and CPT followed by SFT across Gemma, Mistral, and Llama-family models, with general, instruction-tuned, and medical initializations. ...

Ground Control to Synthetic Data: Why Enterprise LLMs Need a Source of Truth

TL;DR for operators Synthetic data is having its predictable enterprise moment: everyone wants more of it, faster, cheaper, and preferably without involving humans who ask inconvenient questions like “is this correct?” The two papers here are useful because they push against that lazy version of the story. StateGen, from PayPal AI, focuses on generating multi-turn training conversations for tool-augmented LLM agents, using an authoritative world-state object, tool simulation, persona variation, and multi-axis judging.1 CYQUARK focuses on generating Text-To-Cypher fine-tuning data from a target property graph and schema, expanding query expressivity while filtering natural-language paraphrases for logical fidelity.2 ...

LoRA’s Rank Excuse Has a Gradient Problem

TL;DR for operators LoRA is usually sold as a rank-and-cost compromise: train a small low-rank adapter instead of updating the whole model, accept some performance gap, and enjoy the budget meeting. The paper behind SDS-LoRA argues that this explanation is incomplete. The gap is not only because the adapter is low-rank. It is also because standard LoRA can distort the training signal that flows into that adapter.1 ...

The One-Weird-Trick Era of LLM Efficiency Is Over

TL;DR for operators The useful lesson from Unifying Data, Memory, and Compute Efficiency in LLM Training: A Survey is not that one efficiency method is about to save everyone’s GPU bill. That would be charming, in the same way procurement decks are charming. The paper’s real contribution is to show why LLM efficiency has become a coupled operating problem: what data you train on changes the compute you spend; how you fit training into memory changes the optimization path; and when you stop, refresh, or reallocate compute depends on both.1 ...

Gradient Customs: AlphaToken Checks Which Tokens Are Allowed to Train

Fine-tuning looks deceptively democratic. Every response token gets its little vote in the gradient. The commas, the boilerplate, the obvious connective tissue, the wrong kind of certainty, the genuinely task-bearing step in the middle of the answer: all are invited to update the model. A charmingly egalitarian arrangement. Also a rather efficient way to teach a model to forget things it used to know. ...

No More Low-Rank Detours: GPart and the Geometry of Fine-Tuning

Adapters are supposed to make fine-tuning simple. A team takes a large pretrained model, freezes most of it, trains a small adapter for customer support, another for invoice extraction, another for compliance review, and so on. The pitch is attractive: less storage, less training cost, faster iteration, fewer excuses from the infrastructure team. Naturally, the adapter becomes the small and tidy object everyone wants to manage. ...

Rank and File: BoostLoRA’s Case for Smarter Fine-Tuning

Opening — Why this matters now Enterprise AI is entering its less glamorous phase: not the demo, not the keynote, not the charming chatbot that answers three curated questions correctly, but the operational grind of making models behave reliably inside messy workflows. That grind usually runs into a familiar triangle. Full fine-tuning is powerful but expensive, operationally heavy, and often risky when the training set is narrow. Parameter-efficient fine-tuning, especially LoRA-style adaptation, is cheaper and easier to deploy, but the smallest adapters can hit a ceiling. Meanwhile, the business user does not care whether the adapter was elegant. They care whether the model stops making the same costly mistakes in invoicing, compliance review, customer support, code generation, or scientific triage. ...

Where to Go Deeper Beyond This Academy

A curated guide to textbooks, authors, websites, and papers for readers who want to study transformer internals, attention math, fine-tuning, GPU optimization, and benchmarking in more depth.

Turning Heads: Why AI Still Gets Lost When It Turns Around

A room is a cruelly simple test for artificial intelligence. Put a person inside it. Tell them they are facing an avocado. Ask them to turn right by 270 degrees, then left by 90 degrees. Give them a few observations along the way. After the final turn, ask what they can see. ...

Beyond the Linear Ceiling: Why Non-Linearity Is the Next Frontier in PEFT

More Rank Is Not Always More Capacity Fine-tuning teams love a simple knob. If the model underperforms, increase rank. If the adapter looks too small, increase rank. If the downstream task is hard, increase rank again and call it strategy. This is comforting because rank is measurable, budgetable, and easy to explain in a meeting. Unfortunately, reality has its usual habit of being less cooperative. ...