Provider: Cohere For AI (C4AI)
License: Aya RAIL License (supports responsible open research and use)
Access: Open weights via Kaggle Models
Architecture: Vision Transformer (ViT-style) trained with contrastive objectives
๐ Overview
Aya Vision is a robust, open-access vision encoder released under Cohere For AIโs Project Aya. It is trained to convert visual input into dense embeddings for use in tasks such as image search, classification, and vision-language alignment. As part of the Aya initiative, it aims to serve as a modular and globally inclusive vision component for multimodal systems.
Highlights:
- ๐ Global Alignment: Part of Ayaโs push for globally inclusive, open research models
- ๐ง Embeddings-First: Designed for downstream use with text encoders, retrieval, or classification
- ๐ Open & Transparent: Training data, weights, and evaluation approach disclosed publicly
โ๏ธ Technical Details
- Model Type: Vision encoder
- Architecture: Vision Transformer (ViT)
- Embedding Output: Dense vector representation of images
- Training Strategy: Contrastive learning with multilingual pairing (details in Aya paper)
- Intended Use: Pretrained image encoder for retrieval or multimodal pairings
๐ Deployment
- Model Access: Aya Vision on Kaggle
- Use Case Examples: Image โ vector retrieval, vision-language pipelines, zero-shot classification
- Compatibility: Works with CLIP-like pipelines and vision-language fusion architectures