Cohere Transcribe arrived as an open-weights 2B speech model trained on 14 languages and scored 4.7% on Artificial Analysis AA-WER. It pairs near-frontier accuracy with about one second of compute per minute of audio.

Cohere Transcribe is a 2B conformer encoder-decoder ASR model trained from scratch across 14 languages, including English, French, Mandarin, Japanese, and Arabic, according to Artificial Analysis' launch thread. The same post says Cohere is releasing it as open weights under Apache 2.0 and making it available through Cohere's API at no cost for now, subject to rate limits.
The headline number is 4.7% AA-WER on the Artificial Analysis speech-to-text benchmark. In Artificial Analysis' accuracy post, that places the model just behind NVIDIA Canary Qwen 2.5B at 4.4% and OpenAI Whisper Large v3 at 4.2%, a small enough gap to matter mainly if you are already optimizing for a particular deployment stack or licensing model. The benchmark itself is a weighted score across three datasets, and the leaderboard methodology emphasizes mixed real-world conditions rather than a single clean test set.
Artificial Analysis' speed post says Cohere Transcribe processes audio at about 60x realtime, or roughly one second of compute for one minute of speech. That combination of near-frontier accuracy and high throughput is the practical story here: it makes the model relevant not just for offline batch transcription, but for latency-sensitive pipelines where teams want open weights without dropping to a much weaker quality tier.
The packaging story moved quickly too. A Hugging Face community conversion highlighted in the quantization post exported the model to ONNX and quantized it down to about 2 GB to get "more closer to edge." The linked model card says the INT8 build targets CPU, Apple Silicon, and GPU runtimes, avoids requiring PyTorch at inference time, and can run about 25% faster than FP32 on CPU. That is third-party work rather than a Cohere release, but it is an early signal that Transcribe may fit smaller-footprint serving setups faster than many new ASR launches do.
Cohere has released Cohere Transcribe: an open weights model achieving 4.7% on AA-WER, based on 3 datasets including our proprietary AA-AgentTalk dataset The 2B parameter model is based on a conformer encoder-decoder architecture. It was trained from scratch on 14 languages Show more
On speed, Cohere Transcribe processes audio at roughly 60x real-time, transcribing a full minute of audio in approximately one second.
Quantised Cohere latest transcribe model to 2 gb, wanted to get more closer to edge. #cohere huggingface.co/vigneshlabs/co…