Google DeepMind launches Gemini Omni for video from text, image, audio, and video input
Google DeepMind launched Gemini Omni and Omni Flash for creating and editing video from text, images, audio, and video, with API rollout still to come. Demos included avatars, conversational edits, and multi-image prompting, while creator tests found storyboard-heavy scenes less stable than Seedance.

TL;DR
- GoogleDeepMind's launch thread pitched Gemini Omni as a model that can create video from "anything," while GoogleDeepMind's rollout note said the first shipping version is Omni Flash in the Gemini app, Flow, and YouTube Shorts, with APIs coming later.
- The official demos in GoogleDeepMind's feature post and GoogleDeepMind's physics and storytelling clip centered on persistent characters, style transfer, and video-to-video edits that keep the scene coherent across location, lighting, and action changes.
- Early hands-on tests from venturetwins' overview broke Omni into avatars, world knowledge, video editing, conversational generation, and multi-image prompting, including a claimed limit of five images plus one video as input venturetwins' multi-image test.
- Creator clips from chrisfirst's flamingo edit and chrisfirst's circus bear edit suggest Omni Flash is unusually strong at local video edits that preserve motion and framing.
- Comparisons from DavidmComfort's storyboard test, BLVCKLIGHTai's side-by-side prompt, and HalimAlrasihi's first tests all landed on the same caveat: Omni Flash looks fast and flexible, but storyboard control and multi-reference consistency still trail Seedance in tougher prompt-following tests.
You can jump straight to Google's story-building demo, skim the Flow update page, and then compare that polished launch framing with the rougher creator tests from venturetwins' Zillow-style multi-image run, DavidmComfort's storyboard comparison, and BLVCKLIGHTai's long-prompt bakeoff.
Omni Flash is live in three surfaces
Google's launch message in GoogleDeepMind's availability post makes the distribution plan plain: Omni Flash is live now in Gemini, Flow, and YouTube Shorts, and API access is still waiting in the wings.
A stage photo shared in petergyang's launch photo repeats the same rollout, which matters because most creator launches bury availability inside keynote slides. The first version shipping broadly is not the full "any input to any output" ambition from GoogleDeepMind's launch thread, it is a Flash variant attached to existing Google creation surfaces.
Google also tied the model directly into Flow. In GoogleDeepMind's Flow post, the company called out batch editing and improved character consistency as part of the Flow update, which puts Omni closer to an editing backbone than a one-shot text-to-video toy.
Character consistency is the headline trick
The official examples in GoogleDeepMind's consistency demo keep returning to one promise: define a character once, then move that character through new worlds, lighting setups, and actions without starting over.
That same idea shows up in creator testing. According to venturetwins' avatar demo, Omni can save a recorded face and voice as a reusable character, then drop that avatar into multiple clips with new outfits or styles.
For creative workflows, that breaks into two separate capabilities:
- Character lock: GoogleDeepMind's feature post says the same character can persist across scenes, actions, and lighting.
- World remap: GoogleDeepMind's environment edit demo shows the same subject moved from a couch to a spaceship, underwater, and into other settings.
- Style remap: GoogleDeepMind's style and motion note says creators can drive looks with reference inputs or natural-language instructions.
That is Christmas come early for anyone tired of rebuilding the same protagonist shot by shot.
Conversational editing changes the workflow
The most interesting reveal is not raw generation quality. It is the idea, shown in venturetwins' conversational generation demo, that you can keep talking to the same video the way you keep talking to a chatbot.
Venturetwins said a follow-up prompt for "more street interviews" continued an existing narrative instead of starting a new clip from scratch. venturetwins' clap-triggered hat edit adds another angle: direct in-place video editing with event-based instructions on uploaded footage.
Other early examples point in the same direction. chrisfirst's flamingo edit changed every person in a shot into matching flamingos, while chrisfirst's circus bear edit transformed a subject with a short imperative prompt. The model is behaving less like a render button and more like a multimodal editing session.
Multi-image and avatar prompting widen the input stack
Google's broad pitch in GoogleDeepMind's launch thread is "create anything from anything," but the practical input stack gets clearer in creator threads than in the keynote clips.
According to venturetwins' multi-image test, Omni can take up to five images and one video as prompt inputs. In the same thread, venturetwins' world-knowledge demo argued that Gemini's knowledge base can fill in context you did not explicitly specify, such as recognizing a location or generating an explainer around a topic from an image.
A few other tests widen that picture:
- egeberkina's biking remix used one POV biking clip as the source for rapid-fire environment changes every second.
- egeberkina's TV-reference montage said Omni "nailed the references" in a fast sequence of iconic TV scenes.
- chrisfirst's explainer video example showed a full explainer-style clip generated from a single prompt.
- ozansihay's feature rundown described conversational editing, template-based generation, and text-heavy scenes as standout parts of the launch.
Taken together, Omni looks less like a single mode and more like a routing layer across text, image, audio, and existing video.
Creator tests already found the weak spot
The launch-day split is pretty sharp. The fast edit demos look great, but the more storyboarded comparisons got skeptical fast.
In DavidmComfort's storyboard comparison, Omni Flash lost on storyboard adherence and style retention versus Seedance 2. HalimAlrasihi's first tests called Omni Flash a big step forward in image and audio quality and multishot understanding, while HalimAlrasihi's follow-up said it still struggled with consistency when several references were combined.
BLVCKLIGHTai's long prompt test in BLVCKLIGHTai's side-by-side prompt is the most useful evidence here because the prompt is fully spelled out. The complaint was not vague vibes. It was that Omni omitted parts of a highly structured scene plan that Seedance handled more completely.
That gives the launch a very specific shape. Omni Flash already looks strong for transforms, edits, and short iterative clips. Tight storyboard execution still seems like the harder frontier.
Google is positioning Omni for story building
A few hours after the launch thread, GoogleDeepMind's story-building post shifted from model bragging to workflow framing with "Build your next story with Gemini Omni," plus a dedicated demo link.
That story angle also shows up in adjacent launch material. GoogleDeepMind's Flow post framed Omni as a way to create more cinematic stories inside Flow, and bilawalsidhu's reaction argued the model makes the most sense when wrapped in a larger authoring tool.
The final useful detail is where Google seems to be steering creators next. GoogleDeepMind's Flow repost teased an "agent" inside Flow alongside Omni, which suggests the bigger product story is not just better video generation. It is a more directed creation stack where planning, editing, and rendering start to blur together.