LLM Fine-Tuning

LoRA Was Supposed to Fit on the Edge. The Activations Disagreed.

TL;DR for operators LoRA does not magically make LLM fine-tuning fit on phones, laptops, or small edge boxes. It reduces the number of trainable parameters. The paper’s useful contribution is showing that this is only the opening move. The real memory bill arrives from activations, checkpoint boundaries, vocabulary-sized output computations, and tokens that are being processed even though they do not contribute to the loss. Apparently the memory allocator did not attend the product strategy meeting. ...

Rank and File: MatryoshkaLoRA Turns One Adapter into Many

The adapter budget problem is not just training cost Budget is usually where fine-tuning conversations become less glamorous. A team wants a customized model. The engineer suggests LoRA because full fine-tuning is expensive. Everyone nods. Then the uncomfortable question arrives: which rank? A low rank is cheap but may underfit. A high rank may work better but costs more memory and inference compute. So the team trains several adapters, compares them, chooses one, and pretends the search process was a minor detail. It was not. It was the hidden invoice. ...

LoRA and Order: The Strange Case for One Well-Placed Adapter

Opening — Why this matters now Enterprise AI is entering its less glamorous, more useful phase: not “Can we connect an LLM to everything?” but “Can we adapt it without making the GPU bill look like a small infrastructure project?” Fine-tuning still matters. Retrieval helps with knowledge access, prompt engineering helps with behavior shaping, and agent frameworks help with workflow orchestration. But many businesses eventually hit the same wall: the base model is close, yet not close enough. It needs domain style, task format, compliance habits, tool-use discipline, or workflow-specific judgment. That usually means some form of supervised fine-tuning. ...

Trex Marks the Spot: When AI Starts Training AI

Fine-tuning is supposed to be the practical part of AI work. You have a model. You have a task. You collect some data, choose a training recipe, run the job, look at the benchmark, and repeat until the result stops embarrassing everyone in the meeting. That tidy version is useful for slide decks. It is less useful for actual model development. ...

When the Answer Matters More Than the Thinking

Answer. In most business systems, that is the part users actually care about. The approval decision. The risk label. The final invoice category. The recommended next action. The tidy little field that decides whether the workflow moves forward or someone opens a Slack thread titled “Why did the AI say this?” Yet much of modern LLM fine-tuning treats that answer as just another slice of text. Worse, when supervised examples include long chain-of-thought explanations, the final answer may become the shortest and least dominant part of the training objective. The model learns to produce a convincing trail of reasoning, but the tiny destination at the end receives comparatively little optimization pressure. Very elegant. Also slightly absurd. ...

Clipped, Grouped, and Decoupled: Why RL Fine-Tuning Still Behaves Like a Negotiation With Chaos

Training a reasoning model sounds wonderfully modern until the model discovers that “being correct” and “looking correct enough to satisfy the reward” are not the same career path. That is the quiet problem behind reinforcement learning fine-tuning for large language models. The research conversation often treats methods like PPO, GRPO, and DAPO as a sequence of upgrades: first the classic algorithm, then the critic-free group method, then the decoupled-and-dynamically-sampled variant with a nicer acronym. Very tidy. Unfortunately, models do not read product positioning decks. ...

Active Minds, Efficient Machines: The Bayesian Shortcut in RLHF

TL;DR for operators Labels are the awkward invoice behind modern alignment. RLHF looks elegant in diagrams: generate outputs, ask humans which one is better, train a reward model, optimise the policy, repeat until everyone pretends the reward model is civilisation. In practice, most preference comparisons are not equally useful. Some are obvious. Some are redundant. Some teach the model almost nothing except that annotator budgets have a sense of humour. ...

Mirror, Signal, Trade: How Self‑Reflective Agent Teams Outperform in Backtests

TL;DR for operators TradingGroup is best read as an operating design for financial agents, not as a permission slip to hand the treasury account to a chatbot with a brokerage API. The paper proposes a five-agent trading system that combines news sentiment, financial-report retrieval, technical forecasting, trading-style selection, and final trade decisions. Around that agent team, it adds two mechanisms that matter more than the agent labels themselves: self-reflection from logged outcomes, and dynamic risk management through stop-loss, take-profit, and position-sizing rules.1 ...

From Tadpole to Titan: How DEVFT Grows LLMs Like a Brain

TL;DR for operators Federated LLM fine-tuning sounds attractive until someone asks the rude operational question: who is actually paying for the compute, memory, and communication on the devices? The paper behind DevFT proposes a useful answer: do not fine-tune the full model end-to-end from the first round. Start with a compact submodel, train it federatively, transfer the learned LoRA parameters forward, then expand the model in stages until it reaches the full target size.1 The authors call this Developmental Federated Tuning, and yes, the developmental psychology metaphor is a little enthusiastic. Fortunately, the mechanism is more interesting than the metaphor. ...

Jack of All Trades, Master of AGI? Rethinking the Future of Multi-Domain AI Agents

TL;DR for operators Most companies do not have an “AI agent” problem. They have an agent zoo problem. One bot answers customer questions. Another writes code. Another searches documents. Another runs workflows. Another tries to sound friendly and occasionally performs the emotional equivalent of wearing a fake moustache. The paper behind this article argues that this fragmentation is not the end state. It proposes NGENT: a next-generation AI agent that integrates multiple specialist abilities into one broadly capable system.1 ...