Gemini 3.1 Flash Live launches with 90.8% audio tool-use score and 128K context
Google launched Gemini 3.1 Flash Live in AI Studio, the API, and Gemini Live with stronger audio tool use, lower latency, and 128K context. Voice-agent teams should benchmark quality, latency, and thinking settings before switching.

TL;DR
- Google launched Gemini 3.1 Flash Live across the API, AI Studio, and Gemini Live, positioning it as a realtime voice-and-vision model for agents with audio streaming, video streaming, transcription, and 128K context launch thread feature rundown.
- The headline quality gain is tool use: Google's benchmark chart puts the model at 90.8% on ComplexFuncBench Audio, up from 71.5% for Gemini 2.5 Flash Native Audio in the newer comparison.
- Google and external benchmarking both frame this as a quality/latency tradeoff release: Artificial Analysis AA benchmark measured 95.9% Big Bench Audio at high thinking with 2.98s time-to-first-audio, versus 70.5% at minimal thinking with 0.96s TTFA AA speed data.
- For implementation, Google says the model ships with Live API support and an "Agent Skill" for live voice agents SDK snippet, while LiveKit says support is already live in its agents stack with "audio in, audio out" and no text conversion in between LiveKit support.
What shipped in Gemini 3.1 Flash Live
Google's launch details describes Gemini 3.1 Flash Live as its fastest native realtime model for building agents, with 70 languages, video streaming, audio transcriptions, 128K context, and generated audio watermarked with SynthID. The same post points developers to the Live API docs and shows the SDK surface for client.aio.live.connect using gemini-3.1-flash-live-preview with audio response modalities.
The product pitch is not just lower latency. DeepMind's DeepMind thread says the model is "better at completing tasks," handles noisy environments better, and can "follow long conversations" so users do not need to repeat themselves. The consumer Gemini app announcement adds that conversations can run through "2x longer" exchanges and that response length and tone adjust dynamically in session Gemini app update.
How the quality-latency tradeoff changed
Google's Google benchmarks claims a "step function improvement in quality, reliability, and latency," and the biggest visible delta is audio tool use. Its chart shows 90.8% on ComplexFuncBench Audio versus 71.5% for Gemini 2.5 Flash Native Audio 12-2025 and 66.0% for the 09-2025 version. On speech reasoning, the same launch material shows 95.9% on Big Bench Audio with high thinking, behind only Step-Audio R1.1 at 97.0% and ahead of Grok Voice Agent at 92.9%.
Artificial Analysis' AA benchmark fills in the operational cost of that gain. With thinking set to high, it measured 95.9% Big Bench Audio and 2.98 seconds TTFA; with minimal thinking, the model drops to 70.5% but improves to 0.96 seconds TTFA, which its AA speed data calls the sixth-fastest result on the speech-to-speech leaderboard. Artificial Analysis also says pricing stayed flat versus Gemini 2.5 Flash Native Audio Dialog at $0.35 per hour of audio input and $1.38 per hour of audio output, excluding reasoning tokens AA speed data.
Where it is available now
Availability landed immediately across Google's own surfaces and partner tooling. LiveKit's LiveKit support says this is the first Gemini 3 native audio model on the Live API and highlights better instruction following, improved tool calling, reduced speaker drift, and support for 70-plus languages inside its agents framework.
Supporters also spotted the model in AI Studio on launch day. The AI Studio listing labels gemini-3.1-flash-live-preview as a low-latency audio-to-audio model optimized for realtime dialogue with "acoustic nuance detection, numeric precision, and multimodal awareness," while TestingCatalog separately reported rollout across AI Studio, APIs, and Gemini Live rollout note. Together, that makes this a same-day launch across Google's consumer app, developer API, and at least one major voice-agent integration path.