Provider: Meta AI
License: Meta Llama Scout License (strictly for research use, not for deployment or commercial use)
Access: Open weights via Hugging Face
Architecture: Sparse Mixture-of-Experts (MoE) Transformer
Experts: 16 experts with 2 active per forward pass
Parameters: 17B total
🔍 Overview
LLaMA 4 Scout 17B 16E is a sparse MoE model released by Meta as part of its LLaMA 4 Scout initiative. It serves as a research prototype for exploring model scaling, routing behavior, and performance trade-offs in sparse expert-based large language models.
Highlights:
- 🧪 Sparse Routing: Activates only 2 out of 16 experts per token, enabling compute efficiency
- 🧠 Research Preview: Not intended for production; shared to encourage study of next-gen architectures
- 🔍 Open Metrics: Released with evaluation data for reproducibility and benchmark comparison
⚙️ Technical Details
- Model Type: Decoder-only MoE transformer
- Parameters: 17B total, ~4.7B active per token
- Routing: Top-2 expert gating
- Training Corpus: Internally curated data (details not fully disclosed)
- Tokenizer: LLaMA tokenizer (same family as LLaMA 2/3)
🚀 Deployment
- Model Card: Meta LLaMA 4 Scout on Hugging Face
- Use Case: Academic and scaling law research
- Inference: Requires custom support for MoE models; compatible with PyTorch / Meta AI tools