Grok Imagine supports 4-shot video prompts in creator tests
Creator tests suggest Grok Imagine can now follow multi-scene video prompts with close-ups, cutaways, and detail shots, though physics glitches remain. Keep sequences short and shot-by-shot if you want usable previs or stylized social clips.

TL;DR
- Creator tests indicate Grok Imagine can now follow four-shot video prompts inside a single clip, including wide shots, close-ups, reaction shots, and detail views, as shown in a robot-painter demo.
- The clearest working examples are written shot-by-shot: one test specifies a robot painting sequence across four planned camera setups, while another maps a Subaru chase into face close-up, exterior cutaway, and rear-seat interior coverage in the first prompt and the driving test.
- Reliability is still uneven. The creator behind the car sequence says the second test has physics errors and is not yet suitable for professional work.
- Early creative use is skewing stylized rather than realistic: a longer music video built with Grok Imagine's comic-book template appears in this creator project, and simpler anime and fantasy clips in anime motion tests suggest short social-ready visuals are the current sweet spot.
What changed in prompting
The new behavior in the primary test is not just image-to-video motion. The prompt explicitly asks for a four-part sequence: a general shot of a robot painting a realistic portrait, a close-up of brushstrokes, a close-up of the robot's face while working, and a detailed look at the finished canvas. The attached clip [vid:0|Robot paints portrait] suggests Grok Imagine can now preserve a single idea while changing camera distance and subject emphasis inside one generated video.
A second creator test shows the same pattern applied to action coverage. In the car example, the prompt breaks the clip into three views—a driver's face, an exterior shot of a Subaru WRX on a muddy cliff road, and an interior rear-seat angle looking back down the road. That post also gives the clearest caveat: physics still break, even if the transition feature is a real step forward.
What creators are making with it
The most concrete downstream use so far is stylized sequencing, not polished realism. One creator's music video says it was made from Grok Imagine images and videos using the comic-book style template, which is a better fit for cuts, panels, and exaggerated motion than strict physical accuracy.
Shorter experiments point the same way. An anime portrait turn in this animation test and a fantasy wizard-dragon clip in a fantasy example both work because they ask the model for mood, motion, and reveal shots rather than precise real-world mechanics. For previs, animatics, and stylized social clips, the evidence so far favors short sequences with clearly separated shot instructions.