LLaMA 4 Scout 17B Instruct (Unsloth, 4-bit)

Provider: Unsloth (based on Meta LLaMA 4 Scout)
License: Meta Llama Scout License (research only, with Unsloth quantization + tuning)
Access: Hosted on Hugging Face
Architecture: Sparse Mixture-of-Experts (MoE), Top-2 Routing
Quantization: 4-bit (bnb) via BitsAndBytes
Base Model: Meta LLaMA 4 Scout 17B 16E

🔍 Overview

This model is a 4-bit quantized and instruction-tuned version of Meta’s LLaMA 4 Scout 17B MoE model, provided by the Unsloth team. It aims to make state-of-the-art sparse expert architectures accessible for low-resource fine-tuning and experimentation.

Highlights:

🔧 Instruction-Tuned: Trained for chat and task completion with improved alignment
💾 4-Bit Quantization: Uses BitsAndBytes for compact memory footprint and faster inference
🧠 MoE Core: Retains sparse routing behavior with 2 experts active per token

⚙️ Technical Details

Model Type: MoE Transformer with 16 experts (2 active)
Parameters: 17B total; ~4.7B active
Quantization: 4-bit via bnb (BitsAndBytes)
Tokenizer: LLaMA-family tokenizer
Fine-Tuning Base: Meta’s original LLaMA 4 Scout release

🚀 Deployment

Model Card: Unsloth LLaMA 4 Scout Instruct (4-bit)
Use Case: Instruction-style prompting, low-resource deployment, QLoRA-compatible
Tools: Supports Hugging Face Transformers, PEFT, LoRA, and GGUF conversions

🔍 Overview#

⚙️ Technical Details#

🚀 Deployment#

🔗 Resources#

🔍 Overview

⚙️ Technical Details

🚀 Deployment

🔗 Resources