Provider: IBM Research
License: IBM Research License (with permissible open-weight usage)
Access: Open weights on Hugging Face
Architecture: Encoder-decoder transformer for speech recognition
Model Size: 8 billion parameters


πŸ” Overview

Granite Speech 3.2 (8B) is part of IBM’s Granite model family and focuses on accurate, scalable speech-to-text transcription across diverse languages and accents. It’s designed for real-world applications like enterprise meeting transcription, call center analytics, and multilingual transcription pipelines.

Key strengths:

  • 🌍 Multilingual Coverage: Trained across multiple languages and accents
  • πŸ“ˆ Robust Accuracy: Competitive with Whisper and other leading open ASR models
  • 🧩 Flexible Deployment: Supports real-time and batch transcription use cases

βš™οΈ Technical Specs

  • Architecture: Transformer-based encoder-decoder
  • Parameters: 8B
  • Audio Input: 16kHz mono waveform
  • Output: Plain text transcription (per language)
  • Streaming Mode: Low-latency inference enabled
  • Training Data: Large-scale curated multilingual datasets

πŸš€ Deployment

  • Model Card: Granite Speech 3.2 on Hugging Face
  • Compatible Tools: πŸ€— Transformers + TorchAudio, ONNX export, enterprise ASR stacks
  • Use Cases: Live captioning, customer support analytics, multilingual accessibility

πŸ”— Resources