Provider: CompVis
License: CreativeML Open RAIL-M (for non-commercial use, with usage restrictions)
Access: Open weights and inference code available
Architecture: Latent Diffusion Model (LDM) trained on 512x512 images
Base Model: UNet-based autoencoder with cross-attention and text conditioning via CLIP
๐ Overview
Stable Diffusion v1.4 is a latent text-to-image diffusion model developed by CompVis, trained on 512x512 images from a subset of the LAION-2B dataset. It provides a powerful and accessible tool for high-resolution image synthesis based on natural language prompts.
Key features include:
- Text-to-Image Generation: Generate photorealistic and artistic images from prompts
- Open and Extensible: Easily fine-tuned and integrated into creative pipelines
- Efficient Inference: Runs on consumer GPUs (e.g., 6GB VRAM) with quick sampling times
- Community Support: One of the most widely adopted and adapted open models in the diffusion space
โ๏ธ Training Details
- Trained on: 256x256 and 512x512 images from LAION-2B-en
- Uses: Autoencoding, UNet, and diffusion modeling
- Conditioning: CLIP text encoder for prompt-based control
๐ Deployment & Use
- Hugging Face Repo: CompVis/stable-diffusion-v1-4
- Inference Tools: Supports deployment via ๐งจ Diffusers, ๐ผ๏ธ AUTOMATIC1111 WebUI, ComfyUI, and RunwayML
- Fine-tuning: Numerous LoRA, DreamBooth, and ControlNet extensions available
- Hardware: Recommended at least 6GB VRAM GPU for local inference