Provider: Unsloth (based on Meta LLaMA 4 Scout)
License: Meta Llama Scout License (research only, with Unsloth quantization + tuning)
Access: Hosted on Hugging Face
Architecture: Sparse Mixture-of-Experts (MoE), Top-2 Routing
Quantization: 4-bit (bnb) via BitsAndBytes
Base Model: Meta LLaMA 4 Scout 17B 16E


🔍 Overview

This model is a 4-bit quantized and instruction-tuned version of Meta’s LLaMA 4 Scout 17B MoE model, provided by the Unsloth team. It aims to make state-of-the-art sparse expert architectures accessible for low-resource fine-tuning and experimentation.

Highlights:

  • 🔧 Instruction-Tuned: Trained for chat and task completion with improved alignment
  • 💾 4-Bit Quantization: Uses BitsAndBytes for compact memory footprint and faster inference
  • 🧠 MoE Core: Retains sparse routing behavior with 2 experts active per token

⚙️ Technical Details

  • Model Type: MoE Transformer with 16 experts (2 active)
  • Parameters: 17B total; ~4.7B active
  • Quantization: 4-bit via bnb (BitsAndBytes)
  • Tokenizer: LLaMA-family tokenizer
  • Fine-Tuning Base: Meta’s original LLaMA 4 Scout release

🚀 Deployment

  • Model Card: Unsloth LLaMA 4 Scout Instruct (4-bit)
  • Use Case: Instruction-style prompting, low-resource deployment, QLoRA-compatible
  • Tools: Supports Hugging Face Transformers, PEFT, LoRA, and GGUF conversions

🔗 Resources