Provider: OpenAI
License: OpenAI Non-Commercial License
Access: Open weights available via Hugging Face and GitHub
Architecture: Encoder-decoder Transformer
Model Size: ~1.55 billion parameters
π Overview
Whisper Large v3 is the latest and most powerful release in OpenAI’s Whisper series, designed for automatic speech recognition (ASR), speech translation, and transcription across over 100 languages.
Compared to previous versions, v3 introduces improved:
- Robustness to accents, background noise, and disfluencies
- Transcription accuracy and latency
- Multilingual and cross-lingual performance
π£οΈ Capabilities
- Multilingual Transcription: Supports 100+ spoken languages
- Translation: Translates non-English speech to English
- Zero-shot: No need for task-specific fine-tuning
- Low-Latency: Optimized decoding strategy for near real-time transcription
βοΈ Technical Specs
- Architecture: Transformer (encoder-decoder)
- Model Size: ~1.55B parameters
- Audio Input Format: 16kHz mono WAV or MP3
- Tokenization: Byte-level BPE with multilingual vocabulary
- Output Format: Text tokens, transcriptions, and translations
π Deployment
- Hugging Face Repo: openai/whisper-large-v3
- CLI Tools:
whisper.cpp
,faster-whisper
, andopenai/whisper
PyTorch implementation - Hardware: Supports GPU and CPU inference; can run on ~8GB VRAM GPU