Provider: IBM Research
License: IBM Research License (with permissible open-weight usage)
Access: Open weights on Hugging Face
Architecture: Encoder-decoder transformer for speech recognition
Model Size: 8 billion parameters
π Overview
Granite Speech 3.2 (8B) is part of IBMβs Granite model family and focuses on accurate, scalable speech-to-text transcription across diverse languages and accents. Itβs designed for real-world applications like enterprise meeting transcription, call center analytics, and multilingual transcription pipelines.
Key strengths:
- π Multilingual Coverage: Trained across multiple languages and accents
- π Robust Accuracy: Competitive with Whisper and other leading open ASR models
- π§© Flexible Deployment: Supports real-time and batch transcription use cases
βοΈ Technical Specs
- Architecture: Transformer-based encoder-decoder
- Parameters: 8B
- Audio Input: 16kHz mono waveform
- Output: Plain text transcription (per language)
- Streaming Mode: Low-latency inference enabled
- Training Data: Large-scale curated multilingual datasets
π Deployment
- Model Card: Granite Speech 3.2 on Hugging Face
- Compatible Tools: π€ Transformers + TorchAudio, ONNX export, enterprise ASR stacks
- Use Cases: Live captioning, customer support analytics, multilingual accessibility