The most full AI hub: fresh stories, workflows, prompts, deals. Updated daily.
Mistral released open-weight Voxtral TTS with low-latency streaming, voice cloning, and cross-lingual adaptation, and vLLM Omni shipped day-0 support. Voice-agent teams should compare quality, latency, and serving cost against closed APIs.

xAI opened a Grok TTS API with five voices, inline controls for laughter and whispering, and multilingual streaming integrations that quickly landed in LiveKit and fal. Try it for voice products that need real-time playback, telephony formats, and hosted integration paths out of the box.


Voxtral TTS uses separate semantic and acoustic token models, a 2.14 kbps codec, and 3-25 second reference audio for cloning across nine languages. Try it if you want a hybrid speech pipeline with more control and faster acoustic synthesis than all-autoregressive generation.

Mistral released open-weight Voxtral TTS with low-latency streaming, voice cloning, and cross-lingual adaptation, and vLLM Omni shipped day-0 support. Voice-agent teams should compare quality, latency, and serving cost against closed APIs.

Cohere released a 2B speech-to-text model with 14 languages and top Open ASR scores, and upstreamed encoder-decoder optimizations to vLLM in the same launch. It is a self-hosted ASR option, so test accuracy and throughput on your own speech workload.

Get the best stories delivered
to your inbox
Discussions
Notable voices
“LocalLLaMA thread blowing up: someone ran DeepSeek-V3 on a Mac Studio with 192GB RAM. 40 tok/s at Q4.”
“r/MachineLearning consensus: The new Llama 4 Scout model is surprisingly good at code. Rivals GPT-4o on HumanEval.”