releaseMarch 30, 2026

Cohere releases Transcribe 2B with 4.7% AA-WER and 60x realtime speed

Cohere Transcribe arrived as an open-weights 2B speech model trained on 14 languages and scored 4.7% on Artificial Analysis AA-WER. It pairs near-frontier accuracy with about one second of compute per minute of audio.

Voice Agents Benchmarks Developer Experience

3 min read

Cohere releases Transcribe 2B with 4.7% AA-WER and 60x realtime speed

TL;DR

Cohere has released an open-weights speech model, Cohere Transcribe, and Artificial Analysis' launch thread says the 2B-parameter model hits 4.7% AA-WER after training from scratch on 14 languages.
On the same accuracy post, Cohere Transcribe lands near NVIDIA Canary Qwen 2.5B at 4.4% AA-WER and OpenAI Whisper Large v3 at 4.2%, which puts it close to the front of the current open-weight pack.
Artificial Analysis' speed post says the model runs at roughly 60x realtime, transcribing about one minute of audio in around one second.
Availability is unusually broad for a fresh ASR release: the launch thread says it is free in Cohere's API with rate limits, and a community ONNX quantization already pushed it toward smaller edge deployment footprints quantization post.

What shipped

Artificial Analysis

@ArtificialAnlys

·Follow

Cohere has released Cohere Transcribe: an open weights model achieving 4.7% on AA-WER, based on 3 datasets including our proprietary AA-AgentTalk dataset The 2B parameter model is based on a conformer encoder-decoder architecture. It was trained from scratch on 14 languages Show more

6:04 PM · Mar 30, 2026

162

Read 3 replies

Cohere Transcribe is a 2B conformer encoder-decoder ASR model trained from scratch across 14 languages, including English, French, Mandarin, Japanese, and Arabic, according to Artificial Analysis' launch thread. The same post says Cohere is releasing it as open weights under Apache 2.0 and making it available through Cohere's API at no cost for now, subject to rate limits.

The headline number is 4.7% AA-WER on the Artificial Analysis speech-to-text benchmark. In Artificial Analysis' accuracy post, that places the model just behind NVIDIA Canary Qwen 2.5B at 4.4% and OpenAI Whisper Large v3 at 4.2%, a small enough gap to matter mainly if you are already optimizing for a particular deployment stack or licensing model. The benchmark itself is a weighted score across three datasets, and the leaderboard methodology emphasizes mixed real-world conditions rather than a single clean test set.

How fast is it, and what does that change for deployment

Artificial Analysis

@ArtificialAnlys

·Follow

Replying to @ArtificialAnlys

On speed, Cohere Transcribe processes audio at roughly 60x real-time, transcribing a full minute of audio in approximately one second.

6:04 PM · Mar 30, 2026

Read 1 reply

Artificial Analysis' speed post says Cohere Transcribe processes audio at about 60x realtime, or roughly one second of compute for one minute of speech. That combination of near-frontier accuracy and high throughput is the practical story here: it makes the model relevant not just for offline batch transcription, but for latency-sensitive pipelines where teams want open weights without dropping to a much weaker quality tier.

Vignesh Varadharajan

@techvignesh

·Follow

Quantised Cohere latest transcribe model to 2 gb, wanted to get more closer to edge. #cohere huggingface.co/vigneshlabs/co…

9:24 PM · Mar 29, 2026

Read 1 reply

The packaging story moved quickly too. A Hugging Face community conversion highlighted in the quantization post exported the model to ONNX and quantized it down to about 2 GB to get "more closer to edge." The linked model card says the INT8 build targets CPU, Apple Silicon, and GPU runtimes, avoids requiring PyTorch at inference time, and can run about 25% faster than FP32 on CPU. That is third-party work rather than a Cohere release, but it is an early signal that Transcribe may fit smaller-footprint serving setups faster than many new ASR launches do.

🧾 More sources

TL;DR1 tweets

Top-line launch, benchmark, speed, and availability details that summarize the release for scanning readers.

What shipped2 tweets

Core model details, licensing, API availability, and benchmark positioning versus nearby open-weight ASR models.