Granite Speech 3.2 (8B)

Provider: IBM Research
License: IBM Research License (with permissible open-weight usage)
Access: Open weights on Hugging Face
Architecture: Encoder-decoder transformer for speech recognition
Model Size: 8 billion parameters

🔍 Overview

Granite Speech 3.2 (8B) is part of IBM’s Granite model family and focuses on accurate, scalable speech-to-text transcription across diverse languages and accents. It’s designed for real-world applications like enterprise meeting transcription, call center analytics, and multilingual transcription pipelines.

Key strengths:

🌍 Multilingual Coverage: Trained across multiple languages and accents
📈 Robust Accuracy: Competitive with Whisper and other leading open ASR models
🧩 Flexible Deployment: Supports real-time and batch transcription use cases

⚙️ Technical Specs

Architecture: Transformer-based encoder-decoder
Parameters: 8B
Audio Input: 16kHz mono waveform
Output: Plain text transcription (per language)
Streaming Mode: Low-latency inference enabled
Training Data: Large-scale curated multilingual datasets

🚀 Deployment

Model Card: Granite Speech 3.2 on Hugging Face
Compatible Tools: 🤗 Transformers + TorchAudio, ONNX export, enterprise ASR stacks
Use Cases: Live captioning, customer support analytics, multilingual accessibility

🔍 Overview#

⚙️ Technical Specs#

🚀 Deployment#

🔗 Resources#

🔍 Overview

⚙️ Technical Specs

🚀 Deployment

🔗 Resources