SFT | Cognaptus

The White Coat Is Not the Treatment

TL;DR for operators Belmadani et al. study a question every serious enterprise LLM team eventually meets after the prototype stops looking magical: which adaptation bill is actually worth paying?1 In French medical question answering, they compare continual pretraining (CPT), supervised fine-tuning (SFT), and CPT followed by SFT across Gemma, Mistral, and Llama-family models, with general, instruction-tuned, and medical initializations. ...

Plan, Don't Spam: The Goldilocks Rule for Test‑Time Compute

A busy agent is not necessarily a thinking agent. Anyone who has watched an LLM agent narrate every tiny move knows the feeling. It reviews the goal. It drafts a plan. It revises the plan. It reconsiders the revision. Then, with exquisite deliberation, it clicks the wrong button. The transcript looks intelligent; the behaviour looks like a consultant trapped in a revolving door. ...

Spin Doctors: Why RL Fine‑Tuning Mostly Rotates, Not Reinvents

TL;DR for operators If your fine-tuned model gets better on the training task while quietly becoming worse outside it, the problem may not be that the model “lost intelligence”. It may have rotated its useful internal directions away from broadly generalizable behaviour. The paper behind this article studies SFT followed by PPO-style RL on two open LLMs using a controlled arithmetic benchmark, then inspects the weight matrices through singular-value decomposition.1 The pattern is clean enough to be operationally interesting: OOD performance peaks early during SFT, falls as SFT continues, and can be substantially restored by RL when the SFT checkpoint is only moderately degraded. But if SFT pushes the model too far into a specialized regime, RL is no longer a reliable rescue crew. Apparently even reinforcement learning has limits. Who knew. ...