Stable Diffusion v1.4

Provider: CompVis

License: CreativeML Open RAIL-M (for non-commercial use, with usage restrictions)

Access: Open weights and inference code available

Architecture: Latent Diffusion Model (LDM) trained on 512x512 images

Base Model: UNet-based autoencoder with cross-attention and text conditioning via CLIP

🔍 Overview

Stable Diffusion v1.4 is a latent text-to-image diffusion model developed by CompVis, trained on 512x512 images from a subset of the LAION-2B dataset. It provides a powerful and accessible tool for high-resolution image synthesis based on natural language prompts.

Key features include:

Text-to-Image Generation: Generate photorealistic and artistic images from prompts
Open and Extensible: Easily fine-tuned and integrated into creative pipelines
Efficient Inference: Runs on consumer GPUs (e.g., 6GB VRAM) with quick sampling times
Community Support: One of the most widely adopted and adapted open models in the diffusion space

⚙️ Training Details

Trained on: 256x256 and 512x512 images from LAION-2B-en
Uses: Autoencoding, UNet, and diffusion modeling
Conditioning: CLIP text encoder for prompt-based control

🚀 Deployment & Use

Hugging Face Repo: CompVis/stable-diffusion-v1-4
Inference Tools: Supports deployment via 🧨 Diffusers, 🖼️ AUTOMATIC1111 WebUI, ComfyUI, and RunwayML
Fine-tuning: Numerous LoRA, DreamBooth, and ControlNet extensions available
Hardware: Recommended at least 6GB VRAM GPU for local inference

🔍 Overview#

⚙️ Training Details#

🚀 Deployment & Use#

🔗 Resources#

🔍 Overview

⚙️ Training Details

🚀 Deployment & Use

🔗 Resources