Cohere released a 2B speech-to-text model with 14 languages and top Open ASR scores, and upstreamed encoder-decoder optimizations to vLLM in the same launch. It is a self-hosted ASR option, so test accuracy and throughput on your own speech workload.

Transcribe 03-2026 is a self-hostable ASR release aimed squarely at production transcription stacks. In the primary announcement, Cohere's model is described as conformer-based, 2B parameters, and covering 14 languages, while a Hugging Face maintainer highlighted that it is both "quite runnable" and available under Apache 2.0 with Transformers support on day one.
On quality, the public claim is straightforward: the launch thread says the model topped the Open ASR leaderboard. The model page is the canonical artifact for trying the weights, and TechCrunch's linked report adds two concrete numbers absent from the tweets: an average WER of 5.42 on the leaderboard and processing speed of 525 minutes of audio per minute. That same report says Cohere's internal human evals showed a 61% win rate on accuracy, coherence, and usability, while also flagging relatively weaker performance in Portuguese, German, and Spanish.
The more consequential engineering detail may be the serving work that landed alongside the model. According to the vLLM post, Cohere contributed encoder-decoder optimizations for variable-length encoder batching and packed attention in the decoder, and vLLM is claiming "up to 2x throughput improvement" for speech workloads. vLLM also says those changes benefit all encoder-decoder models on the runtime, not just Cohere Transcribe.
The rollout path is unusually short for a new speech model. The same announcement says support is available day-0 in vLLM, and the attached install snippet [img:1|vLLM install snippet] shows audio extras plus a vllm serve command targeting CohereLabs/cohere-transcribe-03-2026 with remote code enabled. That means teams already serving through vLLM can test both model quality and the new batching path without waiting for a separate backend integration.
Cohere just topped Open ASR Leaderboard with a 2B model 👑 > conformer based model > covers 14 languages > comes with @huggingface transformers support day-0!
Introducing: Cohere Transcribe – a new state-of-the-art in open source speech recognition.
This is a very solid release! Apache 2.0 as well, 2B parameters (i.e. quite runnable), 14 languages, and supported using Transformers already. Great work @cohere 👏
Introducing: Cohere Transcribe – a new state-of-the-art in open source speech recognition.
🎉 Congrats to @Cohere on releasing Cohere Transcribe, a 2B speech recognition model (Apache 2.0, 14 languages). Day-0 support in vLLM. Cohere contributed encoder-decoder serving optimizations to vLLM: variable-length encoder batching and packed attention for the decoder. Up to Show more
Introducing: Cohere Transcribe – a new state-of-the-art in open source speech recognition.