Provider: CompVis

License: CreativeML Open RAIL-M (for non-commercial use, with usage restrictions)

Access: Open weights and inference code available

Architecture: Latent Diffusion Model (LDM) trained on 512x512 images

Base Model: UNet-based autoencoder with cross-attention and text conditioning via CLIP


๐Ÿ” Overview

Stable Diffusion v1.4 is a latent text-to-image diffusion model developed by CompVis, trained on 512x512 images from a subset of the LAION-2B dataset. It provides a powerful and accessible tool for high-resolution image synthesis based on natural language prompts.

Key features include:

  • Text-to-Image Generation: Generate photorealistic and artistic images from prompts
  • Open and Extensible: Easily fine-tuned and integrated into creative pipelines
  • Efficient Inference: Runs on consumer GPUs (e.g., 6GB VRAM) with quick sampling times
  • Community Support: One of the most widely adopted and adapted open models in the diffusion space

โš™๏ธ Training Details

  • Trained on: 256x256 and 512x512 images from LAION-2B-en
  • Uses: Autoencoding, UNet, and diffusion modeling
  • Conditioning: CLIP text encoder for prompt-based control

๐Ÿš€ Deployment & Use

  • Hugging Face Repo: CompVis/stable-diffusion-v1-4
  • Inference Tools: Supports deployment via ๐Ÿงจ Diffusers, ๐Ÿ–ผ๏ธ AUTOMATIC1111 WebUI, ComfyUI, and RunwayML
  • Fine-tuning: Numerous LoRA, DreamBooth, and ControlNet extensions available
  • Hardware: Recommended at least 6GB VRAM GPU for local inference

๐Ÿ”— Resources