workflowMay 20, 2026

Gemini Omni tests avatar, scene-mood, and object edits in creator workflows

Creators used Gemini Omni in Flow for avatar generation, weather and style transformations, annotation overlays, and object edits, while others posted failures and quality gaps. Treat it as a transformation and editing model rather than a direct Seedance replacement.

6 min read

TL;DR

Google positioned Gemini Omni as an anything-to-video model, with Omni Flash live in the Gemini app, Flow, and YouTube Shorts, while API access is still coming later, according to GoogleDeepMind's availability post and OfficialLoganK's intro post.
Early creator tests split into two buckets: transformation edits looked strong in posts like ai_artworkgen's scene edits and chrisfirst's flamingo edit, while more rigid control tasks drew complaints in bennash's failed clothing edit and DavidmComfort's storyboard comparison.
The most useful workflow shift is that Omni behaves more like a conversational editor than a pure text-to-video generator, as venturetwins' editing demo and venturetwins' conversational generation test both showed.
Avatar capture is already a major creator use case: venturetwins' avatar demo showed face and voice reuse across clips, while ozansihay's Flow screenshot showed the avatar capture flow inside the product.
Several creators argued the Seedance comparison misses the point, because Omni's strongest results came from scene replacement, style transfer, overlays, and object edits rather than benchmark-style prompt obedience, per MayorKingAI's comparison thread and bilawalsidhu's workflow take.

You can watch Google's launch thread, skim the Flow update post, and see the product surfaces on stage slides from I/O. The weird split showed up fast: one creator got clean day, style, and weather swaps, another changed hats on each clap, and someone else immediately broke the mirror VFX. There is also already a small pile of creator-native patterns, from avatar capture to annotation overlays to building custom tools inside Flow.

What shipped

Google's own framing was broad: a model that can "create anything from any input," starting with video, with character consistency, reference-based styling, and video reimagining as the first visible behaviors.

The concrete day-one surfaces were simple:

Omni Flash is available in the Gemini app, per GoogleDeepMind's availability post.
Omni Flash is available in Flow, per the same availability post and petergyang's stage photo.
Omni Flash is available in YouTube Shorts, according to GoogleDeepMind's availability post.
API rollout is planned for the coming weeks, per GoogleDeepMind's availability post.
Flow's paired product update added batch editing and improved character consistency, according to GoogleDeepMind's Flow post.

Google also kept using the "Nano Banana for video" shorthand through creators and execs, including OfficialLoganK's post, but the hands-on clips quickly made it clear that the editing surface is the part people grabbed first.

Avatars and character consistency

The cleanest creator win so far is identity persistence. venturetwins' avatar demo showed a single captured face and voice reused across multiple scenes, and ozansihay's Flow screenshot showed that Flow now has a dedicated avatar capture flow.

The early character-consistency examples clustered around four repeatable patterns:

Record one selfie clip, then reuse that person as a character, according to venturetwins' avatar demo.
Move the same character across lighting, weather, and style changes, as in ai_artworkgen's test.
Swap outfits or scene context while keeping the subject recognizable, per GoogleDeepMind's character-consistency post.
Generate fast avatar-based clips inside Flow, though they cost credits, according to bennash's avatar post.

The strongest public example was ai_artworkgen's four-way test, which kept the same subject while changing the clip to nighttime, 3D animation, and a snow scene. That kind of continuity is more useful to ad creatives and short-form teams than another pure text-to-video beauty shot.

Conversational edits

A lot of the better Omni demos started with existing footage, not a blank prompt. venturetwins' clap-to-hat demo used a live-action clip and asked for a timed subject edit, while ai_artworkgen's follow-up ran two very different passes on the same source video: a Blair Witch found-footage remix and a labeled geology explainer.

That editing behavior breaks down into a few distinct modes:

Style transfer: turn a clean clip into darker found-footage, per ai_artworkgen's Blair Witch remix.
Event-based edits: change an element when an action happens, per venturetwins' hat-on-clap prompt.
Object insertion: add new elements like fireworks, per minchoi's fireworks example.
Object removal: delete elements from an existing shot, per minchoi's remove-objects example.
Angle and scene changes: alter camera perspective or move the action into a new environment, per minchoi's angle-change post and GoogleDeepMind's environment-change demo.
Reference transfer: carry style or subject cues from supplied images, per minchoi's reference-transfer example and venturetwins' multi-image prompting test.

That is why several creators, including MayorKingAI and bilawalsidhu, kept arguing that Seedance is the wrong comparison frame. The clips getting shared hardest are edit passes and remixes.

Where it breaks

The failure reports are not subtle. bennash's clothing-edit complaint said patriotic outfit swaps would not go through at all, techhalla's mirror VFX test produced a visibly wrong result, and bennash's broader complaint called Omni itself a bust even while praising Flow's character tools.

The more technical complaints were about control, not raw wow factor:

HalimAlrasihi's first comparison said Omni Flash was faster and improved on image and audio quality, but still trailed Seedance 2 Fast on consistency.
DavidmComfort's storyboard comparison said Omni followed storyboards less faithfully and lost the intended style.
BLVCKLIGHTai's complex prompt test said Omni omitted parts of a dense shot-by-shot prompt that Seedance handled better.
HalimAlrasihi's follow-up said multiple references in one video made consistency worse.

That leaves a pretty specific early picture: Omni looks strongest when the ask is "transform this clip" or "keep this character, change the world," and weaker when the ask is "obey this dense production spec exactly."

The creator playbook already has a shape

The most interesting part is how quickly creators converged on reusable patterns. minchoi's roundup condensed the first day into a ten-item list, and venturetwins' feature thread mapped five hands-on behaviors that lined up closely with the official product framing.

Across those posts, the repeatable workflow menu already looks like this:

Avatar from one selfie video, per minchoi's avatar example.
Remove objects from live-action footage, per minchoi's remove-objects example.
Add props or effects, like fireworks, per minchoi's fireworks example.
Replace a character or shift scene mood, per minchoi's character-replace example.
Change camera angle, per minchoi's angle-change post.
Zoom from artwork into impossible macro worlds, per minchoi's Mona Lisa example.
Generate explainers or short concept pieces from a single prompt, as in chrisfirst's explainer-video post.
Push rapid visual transitions in POV footage, per egeberkina's biking-video test.

Flow is also starting to look like the real product wrapper around those behaviors. GoogleDeepMind's Flow update tied Omni to batch editing and improved character consistency, tranmautritam's post showed people using Flow to vibe-code custom creative tools, and bennash's avatar note added a practical caveat that fast avatar generations already consume credits.