Skip to content
AI Primer
release

Google DeepMind launches Gemini Omni for video from text, image, audio, and video input

Google DeepMind launched Gemini Omni and Omni Flash for creating and editing video from text, images, audio, and video, with API rollout still to come. Demos included avatars, conversational edits, and multi-image prompting, while creator tests found storyboard-heavy scenes less stable than Seedance.

6 min read
Google DeepMind launches Gemini Omni for video from text, image, audio, and video input
Google DeepMind launches Gemini Omni for video from text, image, audio, and video input

TL;DR

You can jump straight to Google's story-building demo, skim the Flow update page, and then compare that polished launch framing with the rougher creator tests from venturetwins' Zillow-style multi-image run, DavidmComfort's storyboard comparison, and BLVCKLIGHTai's long-prompt bakeoff.

Omni Flash is live in three surfaces

Google's launch message in GoogleDeepMind's availability post makes the distribution plan plain: Omni Flash is live now in Gemini, Flow, and YouTube Shorts, and API access is still waiting in the wings.

A stage photo shared in petergyang's launch photo repeats the same rollout, which matters because most creator launches bury availability inside keynote slides. The first version shipping broadly is not the full "any input to any output" ambition from GoogleDeepMind's launch thread, it is a Flash variant attached to existing Google creation surfaces.

Google also tied the model directly into Flow. In GoogleDeepMind's Flow post, the company called out batch editing and improved character consistency as part of the Flow update, which puts Omni closer to an editing backbone than a one-shot text-to-video toy.

Character consistency is the headline trick

The official examples in GoogleDeepMind's consistency demo keep returning to one promise: define a character once, then move that character through new worlds, lighting setups, and actions without starting over.

That same idea shows up in creator testing. According to venturetwins' avatar demo, Omni can save a recorded face and voice as a reusable character, then drop that avatar into multiple clips with new outfits or styles.

For creative workflows, that breaks into two separate capabilities:

That is Christmas come early for anyone tired of rebuilding the same protagonist shot by shot.

Conversational editing changes the workflow

The most interesting reveal is not raw generation quality. It is the idea, shown in venturetwins' conversational generation demo, that you can keep talking to the same video the way you keep talking to a chatbot.

Venturetwins said a follow-up prompt for "more street interviews" continued an existing narrative instead of starting a new clip from scratch. venturetwins' clap-triggered hat edit adds another angle: direct in-place video editing with event-based instructions on uploaded footage.

Other early examples point in the same direction. chrisfirst's flamingo edit changed every person in a shot into matching flamingos, while chrisfirst's circus bear edit transformed a subject with a short imperative prompt. The model is behaving less like a render button and more like a multimodal editing session.

Multi-image and avatar prompting widen the input stack

Google's broad pitch in GoogleDeepMind's launch thread is "create anything from anything," but the practical input stack gets clearer in creator threads than in the keynote clips.

According to venturetwins' multi-image test, Omni can take up to five images and one video as prompt inputs. In the same thread, venturetwins' world-knowledge demo argued that Gemini's knowledge base can fill in context you did not explicitly specify, such as recognizing a location or generating an explainer around a topic from an image.

A few other tests widen that picture:

Taken together, Omni looks less like a single mode and more like a routing layer across text, image, audio, and existing video.

Creator tests already found the weak spot

The launch-day split is pretty sharp. The fast edit demos look great, but the more storyboarded comparisons got skeptical fast.

In DavidmComfort's storyboard comparison, Omni Flash lost on storyboard adherence and style retention versus Seedance 2. HalimAlrasihi's first tests called Omni Flash a big step forward in image and audio quality and multishot understanding, while HalimAlrasihi's follow-up said it still struggled with consistency when several references were combined.

BLVCKLIGHTai's long prompt test in BLVCKLIGHTai's side-by-side prompt is the most useful evidence here because the prompt is fully spelled out. The complaint was not vague vibes. It was that Omni omitted parts of a highly structured scene plan that Seedance handled more completely.

That gives the launch a very specific shape. Omni Flash already looks strong for transforms, edits, and short iterative clips. Tight storyboard execution still seems like the harder frontier.

Google is positioning Omni for story building

A few hours after the launch thread, GoogleDeepMind's story-building post shifted from model bragging to workflow framing with "Build your next story with Gemini Omni," plus a dedicated demo link.

That story angle also shows up in adjacent launch material. GoogleDeepMind's Flow post framed Omni as a way to create more cinematic stories inside Flow, and bilawalsidhu's reaction argued the model makes the most sense when wrapped in a larger authoring tool.

The final useful detail is where Google seems to be steering creators next. GoogleDeepMind's Flow repost teased an "agent" inside Flow alongside Omni, which suggests the bigger product story is not just better video generation. It is a more directed creation stack where planning, editing, and rendering start to blur together.

Further reading

Discussion across the web

Where this story is being discussed, in original context.

On X· 7 threads
TL;DR5 posts
Omni Flash is live in three surfaces2 posts
Character consistency is the headline trick1 post
Conversational editing changes the workflow2 posts
Multi-image and avatar prompting widen the input stack6 posts
Creator tests already found the weak spot2 posts
Google is positioning Omni for story building2 posts
Share on X