Provider: Unsloth (based on Meta LLaMA 4 Scout)
License: Meta Llama Scout License (research only, with Unsloth quantization + tuning)
Access: Hosted on Hugging Face
Architecture: Sparse Mixture-of-Experts (MoE), Top-2 Routing
Quantization: 4-bit (bnb) via BitsAndBytes
Base Model: Meta LLaMA 4 Scout 17B 16E
🔍 Overview
This model is a 4-bit quantized and instruction-tuned version of Meta’s LLaMA 4 Scout 17B MoE model, provided by the Unsloth team. It aims to make state-of-the-art sparse expert architectures accessible for low-resource fine-tuning and experimentation.
Highlights:
- 🔧 Instruction-Tuned: Trained for chat and task completion with improved alignment
- 💾 4-Bit Quantization: Uses BitsAndBytes for compact memory footprint and faster inference
- 🧠 MoE Core: Retains sparse routing behavior with 2 experts active per token
⚙️ Technical Details
- Model Type: MoE Transformer with 16 experts (2 active)
- Parameters: 17B total; ~4.7B active
- Quantization: 4-bit via
bnb
(BitsAndBytes) - Tokenizer: LLaMA-family tokenizer
- Fine-Tuning Base: Meta’s original LLaMA 4 Scout release
🚀 Deployment
- Model Card: Unsloth LLaMA 4 Scout Instruct (4-bit)
- Use Case: Instruction-style prompting, low-resource deployment, QLoRA-compatible
- Tools: Supports Hugging Face Transformers, PEFT, LoRA, and GGUF conversions