Provider: Mistral AI
License: Apache 2.0
Access: Open weights and commercial-friendly usage
Architecture: Sparse Mixture-of-Experts (MoE) Transformer
Active Parameters per Forward Pass: 12.9 billion (2 of 8 experts)
Total Parameters: 45 billion
๐ Overview
Mixtral 8x7B Instruct v0.1 is Mistral AIโs open-weight MoE model optimized for instruction following and chat. It delivers performance comparable to much larger dense models while significantly reducing compute cost thanks to its sparse expert routing.
Key features:
- Efficient Inference: Only 2 of 8 expert blocks activated per token, leading to high throughput
- Strong Instruction Following: Tuned for chat, reasoning, and guided tasks
- Open and Commercial-Ready: Apache 2.0 license makes it suitable for real-world applications
โ๏ธ Technical Details
- Architecture: Transformer with 8 experts (only 2 used at a time)
- Dense Equivalence: Comparable to ~30Bโ40B dense models in performance
- Training: Pretrained base + supervised instruction tuning
- Context Length: 32K tokens supported
- Tokenizer: Same as Mistral 7B (BPE-based)
๐ Deployment
- Hugging Face Repo: mistralai/Mixtral-8x7B-Instruct-v0.1
- Compatibility: Supports ๐ค Transformers, text-generation-inference, and LoRA fine-tuning
- Serving: Optimized for GPU inference with Flash Attention and tensor parallelism