LLaMA 4 Maverick 17B 128E (Original)

Provider: Meta AI
License: Meta Llama Maverick License (research-only, no commercial use)
Access: Open weights via Hugging Face
Architecture: Sparse Mixture-of-Experts (MoE), Top-2 Routing
Experts: 128 total experts, 2 active per forward pass
Parameters: 17B total

🔍 Overview

LLaMA 4 Maverick 17B 128E is an ultra-sparse, experimental MoE model from Meta’s LLaMA 4 research series. It pushes the boundary of sparse expert design by increasing the number of experts to 128 while maintaining high performance with minimal per-token compute.

Key features:

🧪 High-Sparsity MoE: Only 2 of 128 experts are active per token
🧠 Scalable Design: Explores large-scale routing and activation for efficient scaling
🔍 Research Preview: Released to investigate inference dynamics of ultra-sparse models

⚙️ Technical Details

Model Type: Decoder-only transformer with MoE layers
Experts: 128 total, Top-2 routing per token
Active Parameters per Token: ~4.7B (2 experts only)
Tokenizer: LLaMA tokenizer family
Training: Focused on routing diversity, sparsity effects, and compute efficiency

🚀 Deployment

Model Card: LLaMA 4 Maverick on Hugging Face
Tools: Requires custom inference logic (MoE support in PyTorch or DeepSpeed)
Use Cases: Sparse model benchmarking, MoE routing strategy experiments, LLM scaling research

🔍 Overview#

⚙️ Technical Details#

🚀 Deployment#

🔗 Resources#

🔍 Overview

⚙️ Technical Details

🚀 Deployment

🔗 Resources