releaseMarch 26, 2026

Gemini 3.1 Flash Live launches with 90.8% audio tool-use score and 128K context

Google launched Gemini 3.1 Flash Live in AI Studio, the API, and Gemini Live with stronger audio tool use, lower latency, and 128K context. Voice-agent teams should benchmark quality, latency, and thinking settings before switching.

Gemini Multimodal Voice Agents Realtime AI

3 min read

Gemini 3.1 Flash Live launches with 90.8% audio tool-use score and 128K context

TL;DR

Google launched Gemini 3.1 Flash Live across the API, AI Studio, and Gemini Live, positioning it as a realtime voice-and-vision model for agents with audio streaming, video streaming, transcription, and 128K context launch thread feature rundown.
The headline quality gain is tool use: Google's benchmark chart puts the model at 90.8% on ComplexFuncBench Audio, up from 71.5% for Gemini 2.5 Flash Native Audio in the newer comparison.
Google and external benchmarking both frame this as a quality/latency tradeoff release: Artificial Analysis AA benchmark measured 95.9% Big Bench Audio at high thinking with 2.98s time-to-first-audio, versus 70.5% at minimal thinking with 0.96s TTFA AA speed data.
For implementation, Google says the model ships with Live API support and an "Agent Skill" for live voice agents SDK snippet, while LiveKit says support is already live in its agents stack with "audio in, audio out" and no text conversion in between LiveKit support.

What shipped in Gemini 3.1 Flash Live

Google's launch details describes Gemini 3.1 Flash Live as its fastest native realtime model for building agents, with 70 languages, video streaming, audio transcriptions, 128K context, and generated audio watermarked with SynthID. The same post points developers to the Live API docs and shows the SDK surface for client.aio.live.connect using gemini-3.1-flash-live-preview with audio response modalities.

The product pitch is not just lower latency. DeepMind's DeepMind thread says the model is "better at completing tasks," handles noisy environments better, and can "follow long conversations" so users do not need to repeat themselves. The consumer Gemini app announcement adds that conversations can run through "2x longer" exchanges and that response length and tone adjust dynamically in session Gemini app update.

How the quality-latency tradeoff changed

Google's Google benchmarks claims a "step function improvement in quality, reliability, and latency," and the biggest visible delta is audio tool use. Its chart shows 90.8% on ComplexFuncBench Audio versus 71.5% for Gemini 2.5 Flash Native Audio 12-2025 and 66.0% for the 09-2025 version. On speech reasoning, the same launch material shows 95.9% on Big Bench Audio with high thinking, behind only Step-Audio R1.1 at 97.0% and ahead of Grok Voice Agent at 92.9%.

Artificial Analysis' AA benchmark fills in the operational cost of that gain. With thinking set to high, it measured 95.9% Big Bench Audio and 2.98 seconds TTFA; with minimal thinking, the model drops to 70.5% but improves to 0.96 seconds TTFA, which its AA speed data calls the sixth-fastest result on the speech-to-speech leaderboard. Artificial Analysis also says pricing stayed flat versus Gemini 2.5 Flash Native Audio Dialog at $0.35 per hour of audio input and $1.38 per hour of audio output, excluding reasoning tokens AA speed data.

Where it is available now

Availability landed immediately across Google's own surfaces and partner tooling. LiveKit's LiveKit support says this is the first Gemini 3 native audio model on the Live API and highlights better instruction following, improved tool calling, reduced speaker drift, and support for 70-plus languages inside its agents framework.

Supporters also spotted the model in AI Studio on launch day. The AI Studio listing labels gemini-3.1-flash-live-preview as a low-latency audio-to-audio model optimized for realtime dialogue with "acoustic nuance detection, numeric precision, and multimodal awareness," while TestingCatalog separately reported rollout across AI Studio, APIs, and Gemini Live rollout note. Together, that makes this a same-day launch across Google's consumer app, developer API, and at least one major voice-agent integration path.

🧾 More sources

TL;DR1 tweets

Top-line facts: launch scope, benchmark deltas, thinking-level tradeoffs, and immediate integration availability.

What shipped in Gemini 3.1 Flash Live1 tweets

Core launch capabilities and API surface, including modalities, context window, and conversation behavior claims from Google.

How the quality-latency tradeoff changed1 tweets

Benchmark and serving data showing the model's improved audio reasoning and tool use, plus the latency impact of thinking settings.

Where it is available now1 tweets

Distribution across Google's own surfaces and partner tooling, with evidence from LiveKit, AI Studio, and rollout watchers.

releaseMarch 26, 2026

Gemini 3.1 Flash Live launches with 90.8% audio tool-use score and 128K context

Gemini Multimodal Voice Agents Realtime AI

3 min read

TL;DR

Google launched Gemini 3.1 Flash Live across the API, AI Studio, and Gemini Live, positioning it as a realtime voice-and-vision model for agents with audio streaming, video streaming, transcription, and 128K context launch thread feature rundown.
The headline quality gain is tool use: Google's benchmark chart puts the model at 90.8% on ComplexFuncBench Audio, up from 71.5% for Gemini 2.5 Flash Native Audio in the newer comparison.
Google and external benchmarking both frame this as a quality/latency tradeoff release: Artificial Analysis AA benchmark measured 95.9% Big Bench Audio at high thinking with 2.98s time-to-first-audio, versus 70.5% at minimal thinking with 0.96s TTFA AA speed data.
For implementation, Google says the model ships with Live API support and an "Agent Skill" for live voice agents SDK snippet, while LiveKit says support is already live in its agents stack with "audio in, audio out" and no text conversion in between LiveKit support.

What shipped in Gemini 3.1 Flash Live

Philipp Schmid

@_philschmid

·Follow

We just launched Gemini 3.1 Flash Live! Our fastest, most natural real-time voice AI model for building Agents. - Scores 90.8% on ComplexFuncBench Audio for tool use. - 70 languages, Video streaming, Audio transcriptions, 128k context - Comes with Agent Skill for building live Show more

3:33 PM · Mar 26, 2026

444

Read 28 replies

Google DeepMind

@GoogleDeepMind

·Follow

Say hello to Gemini 3.1 Flash Live. 🗣️ Our latest audio model delivers more natural conversations with improved function calling – making it more useful and informed. Here’s what’s new 🧵 Show more

Watch on X

3:31 PM · Mar 26, 2026

1.8K

Read 94 replies

How the quality-latency tradeoff changed

Logan Kilpatrick

@OfficialLoganK

·Follow

Introducing Gemini 3.1 Flash Live, our new realtime model to build voice and vision agents!! We have spent more than a year improving the model + infra + experience, the results? A step function improvement in quality, reliability, and latency.

3:19 PM · Mar 26, 2026

·Follow

Google has released Gemini 3.1 Flash Live Preview, achieving #2 in our Big Bench Audio Speech to Speech model benchmark, and now features configurable thinking levels With thinking level set to high, it scores 95.9% on Big Bench Audio, making it the second-highest scoring speech Show more