releaseJune 29, 2026

Vercel adds useRealtime, generateSpeech, and transcribe to AI Gateway

Vercel shipped realtime speech and transcription support in AI Gateway and AI SDK 7, then added Grok voice models through the same interface. The update puts voice agents on the same gateway, WebSocket, and AI SDK stack Vercel already uses for text models.

3 min read

Vercel adds useRealtime, generateSpeech, and transcribe to AI Gateway

TL;DR

Vercel shipped realtime voice primitives into the same stack as its text gateway, with vercel's launch post saying AI Gateway now supports useRealtime, generateSpeech, and transcribe in AI SDK 7.
The first-party workflow is already wired into a starter path, because vercel's build-your-first-voice-agent post points straight to a voice agent guide.
According to cramforce's note on platform timing, last week's WebSocket support on Vercel directly enabled this week's realtime model rollout in AI Gateway.
Vercel did not stop at generic realtime support, because vercel_dev's Grok model post added xAI voice, TTS, and STT model slugs through the same interface a few hours later.
The gateway had already been widening its model roster before the voice ship, with Sakana AI Labs' Fugu-Ultra post showing Fugu-Ultra on AI Gateway earlier in the month.

You can build your first voice agent through Vercel's linked guide, watch vercel's launch demo card, and see Grok voice model IDs land on the same gateway surface. rauchg's repost compressed the pitch to four words, while cramforce's follow-up exposed the more interesting implementation detail: this shipped immediately after platform WebSocket support.

Voice agents

The notable part is not just that Vercel added speech features. It exposed three separate primitives, realtime session handling, text-to-speech, and speech-to-text, under AI Gateway and AI SDK 7 in one move, according to vercel's launch post.

That gives voice agents a cleaner shape than a single monolithic API:

useRealtime for live sessions, per vercel's launch post
generateSpeech for TTS, per vercel's launch post
transcribe for STT, per vercel's launch post

WebSockets

The most concrete architecture clue came from cramforce's platform note, which said Vercel shipped platform WebSocket support last week and the AI Gateway team used it this week to ship realtime AI models.

Two short replies from the same thread, cramforce's "Indeed" reply and cramforce's "See" reply, reinforce that the dependency chain was intentional, not coincidence.

Grok voice models

The follow-on ship mattered because it showed this was a gateway surface, not a one-model demo. vercel_dev's Grok voice model post listed three xAI slugs that map onto the same three functions:

xai/grok-voice-think-fast-1.0 for useRealtime
xai/grok-tts for generateSpeech
xai/grok-stt for transcribe

That is a more useful signal than a generic "voice support" announcement, because the model naming makes clear that provider routing is already part of the product surface.

Fugu-Ultra

Voice arrived on top of an already expanding gateway catalog. Earlier Sakana AI posts, the original Fugu-Ultra announcement and a later link back to it, show Fugu-Ultra joining AI Gateway before the realtime rollout.

That earlier addition does not add voice detail, but it does add one new fact at the end of the story: AI Gateway's June updates were not a single feature drop. They were a steady expansion of both transport capabilities and model inventory.

TL;DR

Voice agents

WebSockets

Grok voice models

Fugu-Ultra

Discussion across the web