releaseMay 7, 2026

Google releases Gemini 3.1 Flash Lite GA with 1M context and $0.25 input pricing

Google moved Gemini 3.1 Flash Lite from preview to GA, and OpenRouter added the model with 1 million context and low-cost multimodal pricing. The preview endpoint now has a shutdown schedule, and users should verify whether the GA model differs from the March preview.

4 min read

Google releases Gemini 3.1 Flash Lite GA with 1M context and $0.25 input pricing

TL;DR

Google moved Gemini 3.1 Flash Lite from preview to GA, and GoogleAIStudio's launch post framed it as the company's most cost-efficient model for high-volume agentic tasks, translation, and simple data processing.
OpenRouter's availability post said the GA model is multimodal, supports 1 million context, and is priced at $0.25 per million input tokens and $1.50 per million output tokens.
The odd bit is that Simon Willison's note said it is not clear whether the GA model differs from the March preview, while Simon Willison's release post added that he does not believe the non-preview model changed.
Google paired the GA announcement with an Interactions API migration doc that, per GoogleAIStudio's API thread, shifts agent workflows from role-based turns to explicit step objects.
OpenRouter's deprecation notice said the gemini-3.1-flash-lite-preview endpoint starts deprecating on May 11 and shuts down on May 25, with the schedule linked in Google's deprecations page.

Google shipped the cheap model and the API shape change on the same day. You can jump from the official GA post to the interactions migration guide, and OpenRouter's post already exposes the practical details engineers usually want first: context window, modality coverage, pricing, and an endpoint you can hit today.

What shipped

The launch facts are short and unusually concrete. Google described Flash Lite as its cheapest model, and testingcatalog's AI Studio screenshot shows the public model card in AI Studio with text, image, and video input support plus Jan 2025 knowledge cutoff.

The rollout details from OpenRouter's post are the ones most likely to matter in practice:

Multimodal input: text, image, video, audio, and PDF to text
Context window: 1 million tokens
Pricing: $0.25 per million input tokens, $1.50 per million output tokens
Selectable thinking levels

Google's own launch thread links the official blog post, but the community's first question was whether GA meant a new model or just a label change.

Preview versus GA

llm-gemini 0.31

Release: llm-gemini 0.31 gemini-3.1-flash-lite is no longer a preview. Here's my write-up of the Gemini 3.1 Flash-Lite Preview model back in March. I don't believe this new non-preview model has changed since then. Tags: llm-release, gemini, llm, google, generative-ai, ai, llms

The cleanest counterweight to the launch framing came from Simon Willison. In his tweet, he said the preview model had already been available since March 3 and that pricing appeared unchanged; in his release note, he added that he does not believe the non-preview model changed since then.

That leaves one unresolved but reportable fact: Google announced GA, but the evidence in hand does not show a public capability delta from preview. For this story, GA looks more like a status change than a visible version bump.

Interactions API

The more substantial engineering change may be the request format around the model. According to GoogleAIStudio's thread, Gemini's Interactions API now represents each action as its own step instead of squeezing everything into strict user and model roles.

The step types shown in the migration graphic are:

user_input
thought
function_call
function_call_result
google_search_call
google_search_result
model_output

Google linked dedicated docs for the change in the interactions breaking changes guide. For teams building multi-step agentic runs, that doc probably matters more than the GA badge.

Preview shutdown dates

The old preview endpoint already has a clock on it. OpenRouter's notice said gemini-3.1-flash-lite-preview is deprecated starting May 11 and shuts down May 25, pointing to Google's deprecations schedule.

OpenRouter also used the launch to surface one extra integration detail: its follow-up post says Google models work with OpenRouter's service_tier parameter, which exposes standard, flex, and priority cost-latency tradeoffs and returns the served tier in the response.