Provider: Stability AI
License: Stability AI Non-Commercial Research License
Access: Open weights with restricted use for research and non-commercial purposes
Architecture: Diffusion Transformer (DiT) with flow matching training
Size: Medium (exact parameter count not disclosed)


๐Ÿ” Overview

Stable Diffusion 3 Medium (SD3-M) is part of Stability AIโ€™s next-generation series of text-to-image models. It is designed for enhanced prompt following, improved image structure, and creative style handling.

This version introduces key architectural upgrades:

  • Diffusion Transformer (DiT): Moves away from UNet, enabling better attention-based reasoning
  • Flow Matching Training: A faster alternative to standard score distillation or denoising objectives
  • Advanced Prompt Adherence: Handles long, complex, and multi-subject prompts more reliably than earlier versions

โš™๏ธ Technical Highlights

  • Architecture: DiT (Transformer-based diffusion)
  • Training Objective: Flow matching
  • Input Size: 1024x1024
  • Tokenizer: T5-based text encoder
  • Strengths: Better object interaction, language understanding, composition, and style variety

๐Ÿš€ Deployment

  • Hugging Face Repo: stabilityai/stable-diffusion-3-medium
  • Inference Frameworks: ๐Ÿงจ Diffusers (HF), ComfyUI, and other SD-compatible pipelines
  • Hardware Requirement: High VRAM GPU (recommended 24GB+) for local generation

๐Ÿ”— Resources