releaseMarch 28, 2026

Cohere opens Transcribe 2B weights with a browser demo

Browser demo posts and a Hugging Face release surfaced Cohere Transcribe 2B as part of a wider open-audio week that also featured Voxtral 4B TTS. The model gives creators a multilingual ASR option that can live closer to local or browser workflows.

2 min read

Cohere opens Transcribe 2B weights with a browser demo

TL;DR

Cohere surfaced Transcribe 2B as an open-source transcription model, and Nick Frosst's browser demo framed the key creative hook clearly: it runs in the browser.
The accompanying Hugging Face release describes a 2B-parameter multilingual ASR model with weights available under Apache 2.0 and support for 14 languages.
In Jeff Boudier's open audio demo, Transcribe 2B landed as part of a broader week of open audio releases, alongside Mistral's Voxtral 4B TTS.

What shipped for creators

The immediate change is not just that Cohere released a transcription model, but that the first demo pointed creators toward a browser-adjacent workflow. Frosst's browser demo describes it as a state-of-the-art open model "running in the browser," which puts transcription closer to local editing, rough-cut logging, and quick interview cleanup instead of a purely hosted API pipeline.

The model card on Hugging Face fills in the concrete spec: Transcribe is a 2B-parameter ASR model trained for 14 languages, including English, Spanish, Mandarin, Japanese, Korean, Arabic, and several European languages, with permissive Apache 2.0 licensing according to the release page. For creative teams handling multilingual dailies, voice notes, or documentary footage, that makes this less about a flashy demo and more about having portable weights that can be adapted to existing media workflows.

Boudier's open audio demo also places the release in a bigger pattern: open audio tooling is broadening at both ends, with Transcribe covering speech-to-text while Voxtral 4B targets TTS. That combination matters for creators building captioning, localization, and narration pipelines from open components rather than a single closed stack.

TL;DR

What shipped for creators

Discussion across the web