FP8 | Cognaptus

If GPT-4 was the apex of pretraining, DeepSeek might be the blueprint for what comes next. Released in two families—DeepSeek-V3 and DeepSeek-R1—this Chinese open-source model series isn’t just catching up to frontier LLMs. It’s reshaping the paradigm entirely. By sidestepping traditional supervised fine-tuning in favor of reinforcement learning (RL), and coupling it with memory-efficient innovations like Multi-head Latent Attention (MLA) and cost-efficient training techniques like FP8 mixed precision and fine-grained MoE, DeepSeek models demonstrate how strategic architectural bets can outpace brute-force scale. ...