DINOv2 ViT-L/14
A powerful self-supervised vision foundation model from Meta AI, producing high-quality image embeddings for vision tasks without task-specific labels.
A powerful self-supervised vision foundation model from Meta AI, producing high-quality image embeddings for vision tasks without task-specific labels.
A 12-billion-parameter rectified flow transformer capable of generating images from text descriptions.
A 30-billion-parameter open-source language model from MosaicML — a strong, general-purpose LLM balancing scale, performance, and inference efficiency.
A high-quality text-to-image latent diffusion model trained on LAION-2B, enabling fast and flexible image generation.
A flagship text-to-image model with improved realism, composition, and support for high-resolution 1024x1024 image generation.
A multilingual speech recognition and translation model by OpenAI, supporting 100+ languages with improved robustness and low-latency transcription.