Gemini Omni adds 3D camera trajectories for GoPro-style POV clips
Creator posts showed Gemini Omni handling 3D camera trajectories, tracked label overlays, and character-sheet swaps from single references. That widens Omni from scene edits into repeatable previsualization and explainer workflows, though the evidence is still mostly community demos.

TL;DR
- Gemini Omni creators are already using single reference images and route sketches to generate moving first-person clips, with bilawalsidhu's GoPro-style POV demo extending the earlier map-route examples in levelsio's repost and minchoi's example thread.
- icreatelife's visual annotation demo showed tracked text labels, smooth zooms, and typography that stay attached to scene details, and icreatelife's prompt post makes the workflow unusually easy to copy.
- ai_artworkgen's character-sheet swap demo pushed Omni past one-shot scene edits by swapping a rider, then asking for three fresh camera angles from the same setup.
- The early creator read is that Omni is less a polished cinematic generator than a fast editing and previsualization tool, which fabianstelzer's take summed up as "Nano Banana for video."
icreatelife's prompt post spells out a full tracked-label prompt, bilawalsidhu's follow-up shows the Lodhi Garden reconstruction behind one POV clip, and ai_artworkgen's thread turns a reference clip plus character sheet into new coverage. icreatelife's video is the cleanest explainer example, while bilawalsidhu's camera-trajectory demo is the clearest sign that people are treating Omni like a lightweight previs tool, not just an effects toy.
3D camera trajectories
The most interesting jump is camera control. bilawalsidhu's GoPro-style POV demo says Omni took a 3D camera trajectory and generated first-person footage from it.
That sits on top of the simpler route-driven prompt pattern from levelsio's repost, where a marked-up map screenshot became a taxi-cab POV clip, and bennash's quoted prompt, which preserved the exact prompt wording.
- Input: a reference image or map with a route
- Control signal: a described or supplied path through the scene
- Output: a moving first-person shot instead of a static edit
The useful detail in the follow-ups is that the Lodhi Garden clip appears to have been grounded in an actual 3D reconstruction, per bilawalsidhu's reconstruction follow-up, and bilawalsidhu's location reply says the prompt also included the place name. That makes the workflow look less like pure hallucinated motion and more like camera synthesis anchored to scene structure plus text context.
Tracked label overlays
icreatelife's visual annotation demo is the best evidence that Omni can do explainer-style motion graphics, not just camera moves. The clip adds monochrome AR labels in 3D space, keeps them attached during zooms and pans, and preserves the original scene as one continuous shot.
icreatelife's prompt post gives the whole recipe in plain language:
- Start with a wide shot that matches the reference image exactly.
- Reveal simple overlaid text labels tracked in 3D space.
- Zoom and pan to each detail as each label appears.
- Resolve back to the original starting frame.
- Keep natural ambient sound, no dialogue, no music.
That is a real workflow, not a vibe. It turns a single still into product callouts, architectural explainers, anatomy breakdowns, or scene annotation without leaving the video model.
Character-sheet swaps
ai_artworkgen's character-sheet swap demo shows a different pattern: keep the scene, replace the subject with a character sheet, then ask for additional coverage that did not exist in the original clip.
The prompt structure is specific:
- swap the woman on the horse for a supplied character sheet
- preserve the original clip as the base scene
- generate three extra shots: a low-angle tracking shot, a macro eye close-up, and a rear arrival shot
That matters because it pushes Omni from correction into coverage generation. Instead of fixing a clip, the model is being used to storyboard alternate angles around it.
Omni's current shape
Community examples are converging on a narrow but useful pattern. fabianstelzer's take argued that Omni is not a Seedance 2 competitor, but more like "Nano Banana for video," which fits the demos better than the all-purpose movie-generator framing in minchoi's roundup.
So far, the strongest use cases in the evidence pool are:
- route-to-POV scene synthesis, via levelsio's repost and bilawalsidhu's demo
- tracked explainer overlays, via icreatelife's visual annotation demo
- subject swaps with new camera coverage, via ai_artworkgen's character-sheet swap demo
- educational or astronomical visualization, via kaigani's galaxy-collision example
That is a wider range than simple image-to-video. It looks like fast previs, motion annotation, and iterative edit generation, with most of the evidence still coming from creators stress-testing prompts in public rather than from a formal product breakdown.