Cohere launches Transcribe 03-2026 with 14 languages and Apache 2.0 weights
Cohere released a 2B speech-to-text model with 14 languages and top Open ASR scores, and upstreamed encoder-decoder optimizations to vLLM in the same launch. It is a self-hosted ASR option, so test accuracy and throughput on your own speech workload.

TL;DR
- Cohere released Transcribe 03-2026, a 2B automatic speech recognition model with Apache 2.0 weights, 14-language coverage, and day-0 Hugging Face Transformers support, according to the launch thread and HF practitioner notes.
- Early coverage says the model now sits at the top of the Open ASR leaderboard; the linked TechCrunch report cites a 5.42 average word error rate and says it beat several open and commercial peers in Cohere's human evals TechCrunch summary.
- The launch also shipped immediate vLLM serving support: the vLLM announcement says Cohere upstreamed encoder-decoder optimizations including variable-length encoder batching and packed decoder attention.
- Those vLLM changes are broader than one model. The same post claims up to 2x throughput on speech workloads, with the gains carrying over to other encoder-decoder models served on vLLM.
What shipped
Transcribe 03-2026 is a self-hostable ASR release aimed squarely at production transcription stacks. In the primary announcement, Cohere's model is described as conformer-based, 2B parameters, and covering 14 languages, while a Hugging Face maintainer highlighted that it is both "quite runnable" and available under Apache 2.0 with Transformers support on day one.
On quality, the public claim is straightforward: the launch thread says the model topped the Open ASR leaderboard. The model page is the canonical artifact for trying the weights, and TechCrunch's linked report adds two concrete numbers absent from the tweets: an average WER of 5.42 on the leaderboard and processing speed of 525 minutes of audio per minute. That same report says Cohere's internal human evals showed a 61% win rate on accuracy, coherence, and usability, while also flagging relatively weaker performance in Portuguese, German, and Spanish.
What changed for deployment
The more consequential engineering detail may be the serving work that landed alongside the model. According to the vLLM post, Cohere contributed encoder-decoder optimizations for variable-length encoder batching and packed attention in the decoder, and vLLM is claiming "up to 2x throughput improvement" for speech workloads. vLLM also says those changes benefit all encoder-decoder models on the runtime, not just Cohere Transcribe.
The rollout path is unusually short for a new speech model. The same announcement says support is available day-0 in vLLM, and the attached install snippet [img:1|vLLM install snippet] shows audio extras plus a vllm serve command targeting CohereLabs/cohere-transcribe-03-2026 with remote code enabled. That means teams already serving through vLLM can test both model quality and the new batching path without waiting for a separate backend integration.