Mixtral 8x7B Instruct v0.1

Provider: Mistral AI
License: Apache 2.0
Access: Open weights and commercial-friendly usage
Architecture: Sparse Mixture-of-Experts (MoE) Transformer
Active Parameters per Forward Pass: 12.9 billion (2 of 8 experts)
Total Parameters: 45 billion

🔍 Overview

Mixtral 8x7B Instruct v0.1 is Mistral AI’s open-weight MoE model optimized for instruction following and chat. It delivers performance comparable to much larger dense models while significantly reducing compute cost thanks to its sparse expert routing.

Key features:

Efficient Inference: Only 2 of 8 expert blocks activated per token, leading to high throughput
Strong Instruction Following: Tuned for chat, reasoning, and guided tasks
Open and Commercial-Ready: Apache 2.0 license makes it suitable for real-world applications

⚙️ Technical Details

Architecture: Transformer with 8 experts (only 2 used at a time)
Dense Equivalence: Comparable to ~30B–40B dense models in performance
Training: Pretrained base + supervised instruction tuning
Context Length: 32K tokens supported
Tokenizer: Same as Mistral 7B (BPE-based)

🚀 Deployment

Hugging Face Repo: mistralai/Mixtral-8x7B-Instruct-v0.1
Compatibility: Supports 🤗 Transformers, text-generation-inference, and LoRA fine-tuning
Serving: Optimized for GPU inference with Flash Attention and tensor parallelism

🔍 Overview#

⚙️ Technical Details#

🚀 Deployment#

🔗 Resources#

🔍 Overview

⚙️ Technical Details

🚀 Deployment

🔗 Resources