DINOv2 ViT-L/14
A powerful self-supervised vision foundation model from Meta AI, producing high-quality image embeddings for vision tasks without task-specific labels.
A powerful self-supervised vision foundation model from Meta AI, producing high-quality image embeddings for vision tasks without task-specific labels.
A widely used self-supervised speech representation model from Meta AI for automatic speech recognition and audio understanding tasks.