Provider: Cohere For AI (C4AI)
License: Aya RAIL License (supports responsible open research and use)
Access: Open weights via Kaggle Models
Architecture: Vision Transformer (ViT-style) trained with contrastive objectives


๐Ÿ” Overview

Aya Vision is a robust, open-access vision encoder released under Cohere For AIโ€™s Project Aya. It is trained to convert visual input into dense embeddings for use in tasks such as image search, classification, and vision-language alignment. As part of the Aya initiative, it aims to serve as a modular and globally inclusive vision component for multimodal systems.

Highlights:

  • ๐ŸŒ Global Alignment: Part of Ayaโ€™s push for globally inclusive, open research models
  • ๐Ÿง  Embeddings-First: Designed for downstream use with text encoders, retrieval, or classification
  • ๐Ÿ”“ Open & Transparent: Training data, weights, and evaluation approach disclosed publicly

โš™๏ธ Technical Details

  • Model Type: Vision encoder
  • Architecture: Vision Transformer (ViT)
  • Embedding Output: Dense vector representation of images
  • Training Strategy: Contrastive learning with multilingual pairing (details in Aya paper)
  • Intended Use: Pretrained image encoder for retrieval or multimodal pairings

๐Ÿš€ Deployment

  • Model Access: Aya Vision on Kaggle
  • Use Case Examples: Image โ†’ vector retrieval, vision-language pipelines, zero-shot classification
  • Compatibility: Works with CLIP-like pipelines and vision-language fusion architectures

๐Ÿ”— Resources