Browser demo posts and a Hugging Face release surfaced Cohere Transcribe 2B as part of a wider open-audio week that also featured Voxtral 4B TTS. The model gives creators a multilingual ASR option that can live closer to local or browser workflows.

The immediate change is not just that Cohere released a transcription model, but that the first demo pointed creators toward a browser-adjacent workflow. Frosst's browser demo describes it as a state-of-the-art open model "running in the browser," which puts transcription closer to local editing, rough-cut logging, and quick interview cleanup instead of a purely hosted API pipeline.
The model card on Hugging Face fills in the concrete spec: Transcribe is a 2B-parameter ASR model trained for 14 languages, including English, Spanish, Mandarin, Japanese, Korean, Arabic, and several European languages, with permissive Apache 2.0 licensing according to the release page. For creative teams handling multilingual dailies, voice notes, or documentary footage, that makes this less about a flashy demo and more about having portable weights that can be adapted to existing media workflows.
Boudier's open audio demo also places the release in a bigger pattern: open audio tooling is broadening at both ends, with Transcribe covering speech-to-text while Voxtral 4B targets TTS. That combination matters for creators building captioning, localization, and narration pipelines from open components rather than a single closed stack.
@cohere transcribe Sota open source transcription model running in the browser :) Weights on @huggingface link below
What a week for open audio models! ๐บ ๐ I demo: ๐ฃ๏ธ Voxtral 4B TTS from @MistralAI ๐๏ธ Transcribe 2B from @cohere ๐ญ and how to run a batch transcribe job in 1 line of CLI using @vanstriendaniel uv script links below