Skip to content
AI Primer
update

Gemini Omni Flash ranks #1 on Video Arena with 1404 Elo

Gemini Omni Flash ranked #1 on Video Arena at 1404 Elo, 101 points above Seedance 2.0 Mini, and ComfyUI posted a text-prompt video-edit workflow. Google noted the leaderboard is third-party, leaving benchmark provenance as the main caveat.

4 min read
Gemini Omni Flash ranks #1 on Video Arena with 1404 Elo
Gemini Omni Flash ranks #1 on Video Arena with 1404 Elo

TL;DR

Google’s docs now include an Omni Flash guide with stateful editing via previous_interaction_id, a pricing table that converts 5,792 video tokens/second into about $0.10/second, and a video docs page that positions Omni Flash as the default Gemini API model for multi-turn video editing. _philschmid's code screenshot compresses upload, prompt, and MP4 write into 20 lines; fal's post gives practical caps of 10-second clips and 1280x720 output.

Video Arena gap

The chart in WesRoth's Video Arena post shows an unusually clean top of table:

  • Gemini Omni Flash: 1404
  • Seedance 2.0 Mini: 1303
  • Seedance 2.0: 1300
  • Seedance 2.0 Fast: 1296
  • Veo 3 Fast: 1229
  • Veo 3.1: 1214
  • Veo 3: 1198

That puts Omni Flash 101 Elo over the next model and 190 Elo over Veo 3.1 in the same screenshot. teortaxesTex's reaction read the gap as a bigger jump than the recent Seedance-over-Veo shift.

Separate leaderboard views diverge. WesRoth's Video Edit Arena screenshot shows seven ranked editing models, with Dreamina Seedance-2.0 at 1,377 and Gemini Omni Flash at 1,347.

Benchmark provenance is the caveat. OfficialLoganK's leaderboard reply said Google does not run the leaderboard and first saw the results when another company posted them.

Conversational video edits

ComfyUI framed the new workflow as editing existing video with a prompt, then iterating when the first pass needs specificity. ComfyUI's follow-up demo listed object swaps, environment changes, and targeted edits as the core moves.

The API surface is small enough to fit in a screenshot. _philschmid's example uploads night_scene.mp4, calls client.interactions.create with gemini-omni-flash-preview, sends the prompt “Change the scene from nighttime to daytime,” and writes day_scene.mp4 from base64 output.

Google's Omni Flash guide describes the same loop as stateful video editing: each turn produces a new video, and previous_interaction_id carries the prior video state into the next edit.

API shape and pricing

The launch bundled two media models, not one. OfficialLoganK's launch post introduced Nano Banana 2 Lite for sub-4-second, $0.034 1K image generation and Gemini Omni Flash for video editing at $0.10/second.

Google's launch post gives the working split:

  • Nano Banana 2 Lite: gemini-3.1-flash-lite-image, available in Google AI Studio, Gemini API, and Gemini Enterprise Agent Platform.
  • Gemini Omni Flash: gemini-omni-flash-preview, available in Google AI Studio, Gemini API, Gemini Enterprise Agent Platform, the Gemini app, and Google Flow.
  • Chained workflow: generate images with Nano Banana 2 Lite, then pass them as references to Omni Flash for animation.
  • Multi-turn cap in the launch post: up to three sequential edits through the Interactions API.

The Gemini API pricing page lists no free tier for Omni Flash Preview, $1.50 per 1M input tokens, $9 per 1M text output tokens, and $17.50 per 1M video output tokens. The same page says video billing is calculated at 5,792 tokens per second of 720p output, which works out to about $0.10/second.

Preview limits

Early integrators are already exposing the practical envelope. fal's post says its Omni Flash video-edit surface accepts 10-second videos and returns 1280x720 output.

Google's official launch post lists the current model limits more bluntly:

  • Omni offers 10-second video generations, with longer durations “coming soon.”
  • Uploading audio references and scene extension are not supported in the Gemini API for the model.
  • Video references up to 3 seconds are accepted by the API schema but are not correctly processed by the model yet.
  • Character consistency can degrade when changing scenes or panning.

The developer guide adds region and subject restrictions: uploaded-video editing is unavailable in the EEA, Switzerland, and the UK, while uploading and editing images containing minors is unsupported in those regions plus Switzerland and the UK.

Further reading

Discussion across the web

Where this story is being discussed, in original context.

On X· 3 threads
TL;DR1 post
Video Arena gap2 posts
Conversational video edits1 post
Share on X