Provider: Mistral AI
License: Apache 2.0
Access: Open weights and commercial-friendly usage
Architecture: Sparse Mixture-of-Experts (MoE) Transformer
Active Parameters per Forward Pass: 12.9 billion (2 of 8 experts)
Total Parameters: 45 billion


๐Ÿ” Overview

Mixtral 8x7B Instruct v0.1 is Mistral AIโ€™s open-weight MoE model optimized for instruction following and chat. It delivers performance comparable to much larger dense models while significantly reducing compute cost thanks to its sparse expert routing.

Key features:

  • Efficient Inference: Only 2 of 8 expert blocks activated per token, leading to high throughput
  • Strong Instruction Following: Tuned for chat, reasoning, and guided tasks
  • Open and Commercial-Ready: Apache 2.0 license makes it suitable for real-world applications

โš™๏ธ Technical Details

  • Architecture: Transformer with 8 experts (only 2 used at a time)
  • Dense Equivalence: Comparable to ~30Bโ€“40B dense models in performance
  • Training: Pretrained base + supervised instruction tuning
  • Context Length: 32K tokens supported
  • Tokenizer: Same as Mistral 7B (BPE-based)

๐Ÿš€ Deployment

  • Hugging Face Repo: mistralai/Mixtral-8x7B-Instruct-v0.1
  • Compatibility: Supports ๐Ÿค— Transformers, text-generation-inference, and LoRA fine-tuning
  • Serving: Optimized for GPU inference with Flash Attention and tensor parallelism

๐Ÿ”— Resources