Provider: Meta AI
License: Meta Llama Scout License (strictly for research use, not for deployment or commercial use)
Access: Open weights via Hugging Face
Architecture: Sparse Mixture-of-Experts (MoE) Transformer
Experts: 16 experts with 2 active per forward pass
Parameters: 17B total


🔍 Overview

LLaMA 4 Scout 17B 16E is a sparse MoE model released by Meta as part of its LLaMA 4 Scout initiative. It serves as a research prototype for exploring model scaling, routing behavior, and performance trade-offs in sparse expert-based large language models.

Highlights:

  • 🧪 Sparse Routing: Activates only 2 out of 16 experts per token, enabling compute efficiency
  • 🧠 Research Preview: Not intended for production; shared to encourage study of next-gen architectures
  • 🔍 Open Metrics: Released with evaluation data for reproducibility and benchmark comparison

⚙️ Technical Details

  • Model Type: Decoder-only MoE transformer
  • Parameters: 17B total, ~4.7B active per token
  • Routing: Top-2 expert gating
  • Training Corpus: Internally curated data (details not fully disclosed)
  • Tokenizer: LLaMA tokenizer (same family as LLaMA 2/3)

🚀 Deployment

  • Model Card: Meta LLaMA 4 Scout on Hugging Face
  • Use Case: Academic and scaling law research
  • Inference: Requires custom support for MoE models; compatible with PyTorch / Meta AI tools

🔗 Resources