releaseJune 30, 2026

Google releases Nano Banana 2 Lite and Gemini Omni Flash

Google shipped Nano Banana 2 Lite for image generation and Gemini Omni Flash for conversational video generation and editing in the Gemini API and AI Studio. The release sets image generation at about 4 seconds and $0.034 per 1K image, while Omni Flash adds multi-turn video edits at $0.10 per second.

5 min read

Google releases Nano Banana 2 Lite and Gemini Omni Flash

TL;DR

Google shipped GoogleDeepMind's launch thread as two separate media models: Nano Banana 2 Lite for images, and Gemini Omni Flash for video generation and conversational editing.
According to Google AI Studio's Nano Banana 2 Lite post and _philschmid's launch summary, Nano Banana 2 Lite runs at about 4 seconds per 1K image and costs $0.034 per image.
Google AI Studio's Omni Flash post and _philschmid's launch summary both put Gemini Omni Flash at $0.10 per second for clips up to 10 seconds, with multi-turn edits preserved in session context.
On third-party leaderboards, Arena's Nano Banana 2 Lite ranking placed the image model at #5 in text-to-image, while Arena's Omni Flash video edit ranking placed the video model at #2 for video editing.
The most interesting product shape came from GoogleDeepMind's Interactions API post, which pitched the pair as a chained workflow: generate a reference image first, then animate and iteratively edit it in the same session.

You can jump straight to Google's blog post, skim the Nano Banana 2 Lite docs, and check the Omni Flash docs. The product connective tissue sits in AI Studio and the Interactions API, where GoogleDeepMind's workflow demo shows Google treating image generation plus conversational video editing as one stack, not two isolated releases.

What shipped

Google's official framing was simple: one low-latency image model, one video model with conversational edits.

Nano Banana 2 Lite: GA image model, exposed as gemini-3.1-flash-lite-image, positioned as the fastest and cheapest Gemini image option, per _philschmid's launch summary.
Gemini Omni Flash: preview video model, exposed as gemini-omni-flash-preview, available in the Gemini API, Google AI Studio, and the Gemini Enterprise Agent Platform, according to GoogleDeepMind's Omni Flash feature list.
Chained workflow: Google said its Interactions API post can generate an image with Nano Banana 2 Lite, then pass that into Omni Flash for animation and follow-up edits.
Stateful edits: session history preserves up to three sequential edits, according to GoogleDeepMind's Interactions API post and _philschmid's launch summary.

Nano Banana 2 Lite

The image model's whole pitch is throughput. Google AI Studio's announcement called it a high-velocity model for developer pipelines, and GoogleDeepMind's speed post set the target at 4-second text-to-image output.

A few concrete numbers landed immediately:

Latency: about 4 seconds per 1K image, per GoogleDeepMind's speed post.
Price: $0.034 per 1K image, per _philschmid's launch summary.
Model slug: gemini-3.1-flash-lite-image, per _philschmid's launch summary.
Relative positioning: rohanpaul_ai's model lineup described it as the speed-first replacement for legacy Gemini 2.5 Flash Image.

Third-party scores made the tradeoff legible. Arena's ranking post put Nano Banana 2 Lite at #5 overall in text-to-image with a 1251 score, and fal's availability post claimed two-second generation and editing at 1K across 14 aspect ratios on its own surface.

Gemini Omni Flash

Omni Flash is the more interesting release. It turns video generation and editing into a chat loop, which is Christmas come early for people building media agents.

Google's own feature list in GoogleDeepMind's Omni Flash feature list broke the model into four capabilities:

Conversational video editing
Multimodal referencing and combined inputs
Real-world knowledge
Direct links between text, graphics, and video actions

The API details were just as specific in _philschmid's launch summary:

Price: $0.10 per second
Clip length: up to 10 seconds
Model slug: gemini-omni-flash-preview
Session behavior: each edit builds on prior context, for up to three sequential edits

Third-party ranking posts gave Google a decent day-one brag sheet. Arena's Omni Flash video edit ranking scored it 1347 in Video Edit Arena, #2 overall and nearly 40 points ahead of the next-best model there. fal's launch post also emphasized synchronized audio and scene-preserving edits from mixed inputs.

Where it shows up

Google did not keep either model inside its own UI for long.

Day-one and near-day-one rollouts included:

NotebookLM: Google's NotebookLM post said 60-second vertical Video Overviews are powered by Nano Banana 2 Lite.
fal: fal's launch post exposed Omni Flash for text-to-video, image-to-video, reference-to-video, and editing, while fal's Nano Banana 2 Lite post added the image model.
Vercel AI Gateway: Vercel's availability post added Nano Banana 2 Lite and called it half the cost of previous models on that surface.
Replicate: Replicate's launch post published Nano Banana 2 Lite access.
ComfyUI: ComfyUI's partner nodes post added Omni Flash nodes for text-to-video, image-to-video, and video edit workflows.
Agent tooling: _philschmid's skill post and fofrAI's skill thread both published installable skills for Omni Flash workflows, including helper tools for video prep and inspection.

Early caveats

The launch posts were clean. The replies and follow-up threads were where the rough edges showed up.

According to rohanpaul_ai's caveat thread, Omni Flash currently lacks API audio reference support, and video references up to 3 seconds were documented but not yet processing correctly. fofrAI's trim and frame-rate reply added that prep helpers trim clips to 10 seconds, resize them, and reduce frame rate to 24 fps rather than increasing it.

The model lineup is also narrower than the branding might suggest. OfficialLoganK's reply said only a Flash version of Omni is available across products right now, with no parallel higher-tier Omni variant on launch day.

Hands-on posts surfaced one more limitation: DynamicWebPaige's translation demo found that video translations kept the speaker's voice, but also kept an American accent across German, French, and Hindi outputs.