releaseMarch 16, 2026

Grok launches Text-to-Speech API with expressive controls and LiveKit support

xAI released Grok's Text-to-Speech API with natural voices, expressive controls, and LiveKit support; creators are also using Grok Imagine in reference-image and cartoon animation workflows. Try it if you want Grok in a broader voice-and-motion stack instead of chat alone.

2 min read

Grok launches Text-to-Speech API with expressive controls and LiveKit support

TL;DR

xAI has launched Grok’s TTS API launch, adding natural voices and expressive controls for text-to-speech output.
The companion LiveKit post says Grok TTS is already available inside LiveKit Inference with low-latency streaming.
Beyond voice, creators are using reference-image animation and cartoon animation demo to turn still images and stylized characters into short motion clips.
One emerging stack pairs Niji-to-3D workflow with Grok video generation, using Midjourney or Nano Banana images as the visual starting point.

What shipped

Grok’s new TTS release is aimed at builders who want voice as part of a broader creative product, not just chat output. xAI’s voice API page describes five voices, expressive speech controls, and multiple audio formats, alongside speech-to-text and real-time voice-agent tooling.

The immediate practical detail is distribution: the LiveKit support post says Grok TTS is already wired into LiveKit Inference with low-latency streaming. That lowers the integration burden for teams already prototyping voice characters, narrated experiences, or interactive agents inside LiveKit-based pipelines.

What creators are making with Grok Imagine

On the image-to-motion side, creators are treating Grok Imagine less as a one-shot generator and more as the animation layer in a mixed-tool workflow. In one example, reference-image animation shows a cute creature clip built from reference images, preserving character feel across a short animated beat.

Another creator packaged a three-step recipe: generate a 2D image with a Niji 6 style reference, convert it into a 3D look with a transformation prompt, then use Grok for video Niji-to-3D workflow. Others are doing the same kind of handoff from outside image models: cartoon animation demo animates Midjourney-style cartoon art in Grok, while a thinner but clear example uses Nano Banana stills as source imagery before Grok motion Nano Banana remix. The pattern is consistent: Grok is showing up as the motion pass in a creator stack assembled from several image tools.

TL;DR

What shipped

What creators are making with Grok Imagine

Discussion across the web