xAI opened a Grok TTS API with five voices, inline controls for laughter and whispering, and multilingual streaming integrations that quickly landed in LiveKit and fal. Try it for voice products that need real-time playback, telephony formats, and hosted integration paths out of the box.

The first public payload is straightforward: five named voices, multilingual synthesis, and inline expressive controls rather than a separate prosody system. In the xAI demo, the API is presented as "Grok Text-to-Speech API" with Eve, Ara, Leo, Rex, and Sal. A more detailed supporting thread says a single POST call can trigger "laughs, whispers, and sighs on command," with tags for pauses and emphasis as well.
That thread also adds the implementation details engineers usually look for first: auto-detection across 20+ languages and output formats spanning telephony-grade 8kHz through 48kHz audio. The same post claims the voice stack was built in-house, including VAD, tokenizer, and audio models, which matters mostly as a signal that xAI is shipping a full speech stack rather than a thin wrapper.
The fastest ecosystem pickup came from LiveKit. In its integration announcement, LiveKit says Grok TTS is available inside LiveKit Inference with "natural, expressive voices," low-latency streaming, and 20+ languages. The linked plugin guide shows two paths: through LiveKit Inference, or directly against xAI via the livekit-agents[xai] plugin and an API key.
fal shipped a hosted endpoint at nearly the same time. According to fal's post, the service includes real-time WebSocket streaming, the same five-voice setup with inline emotion tags, and published pricing at $0.0042 per 1K characters. That gives teams at least two off-the-shelf integration routes on day one: agent-oriented voice sessions through LiveKit and direct hosted inference through fal.
xAI released Grok's Text-to-Speech API with Eve, Ara, Leo, Rex, and Sal voices available. Loads of demos to play with on api/voice 👀
Grok's Text to Speech API is now available. Start building with natural voices and expressive controls to bring your apps to life. x.ai/api/voice#text…
Grok's Text to Speech API is now available in LiveKit Inference. Natural, expressive voices with low-latency streaming. Multilingual in 20+ languages. Telephony and production-ready out of the box. One API key. No extra setup. → docs.livekit.io/agents/models/…
Grok's Text to Speech API is now available. Start building with natural voices and expressive controls to bring your apps to life. x.ai/api/voice#text…
🚨 xAI Grok Text-to-Speech is now live on fal! 🗣️ 5 expressive voices with inline emotion tags 🌍 20+ languages with automatic language detection 💰 Insanely affordable at $0.0042 per 1K characters ⚡ Real-time WebSocket streaming for low-latency playback