Provider: Stability AI
License: Stability AI Non-Commercial Research License
Access: Open weights with restricted use for research and non-commercial purposes
Architecture: Diffusion Transformer (DiT) with flow matching training
Size: Medium (exact parameter count not disclosed)
๐ Overview
Stable Diffusion 3 Medium (SD3-M) is part of Stability AIโs next-generation series of text-to-image models. It is designed for enhanced prompt following, improved image structure, and creative style handling.
This version introduces key architectural upgrades:
- Diffusion Transformer (DiT): Moves away from UNet, enabling better attention-based reasoning
- Flow Matching Training: A faster alternative to standard score distillation or denoising objectives
- Advanced Prompt Adherence: Handles long, complex, and multi-subject prompts more reliably than earlier versions
โ๏ธ Technical Highlights
- Architecture: DiT (Transformer-based diffusion)
- Training Objective: Flow matching
- Input Size: 1024x1024
- Tokenizer: T5-based text encoder
- Strengths: Better object interaction, language understanding, composition, and style variety
๐ Deployment
- Hugging Face Repo: stabilityai/stable-diffusion-3-medium
- Inference Frameworks: ๐งจ Diffusers (HF), ComfyUI, and other SD-compatible pipelines
- Hardware Requirement: High VRAM GPU (recommended 24GB+) for local generation