workflowMarch 14, 2026

Creators report Grok Imagine supports multi-reference cartoons and reference-to-video clips

Users report Grok Imagine can combine multiple references for cartoons, mashups, and short reference-to-video clips. Stack reference images when character identity matters more than raw prompt invention.

3 min read

Creators report Grok Imagine supports multi-reference cartoons and reference-to-video clips

TL;DR

Early creator tests suggest Grok Imagine now accepts multiple reference images for generation, with a cartoon demo and a mashup test both showing identity-driven outputs rather than pure prompt-only invention.
A separate reference-to-video post points to image-guided video generation, and its interface screenshot shows multiple visual inputs alongside 480p/720p, 6s/10s, and 9:16 options.
Creators are already using the feature for cartoon consistency, pet-character mashups, and short stylized clips, according to one cartoon workflow, one animal mashup, and one creature animation test.
The strongest evidence so far is about short-form experiments and style/character transfer; a longer music post confirms Grok Imagine use, but not the same level of parameter detail shown in the interface capture.

What changed in Grok Imagine

The clearest new capability is in Ozan Sihay's post, which shows Grok Imagine building a video from several reference images plus a text prompt. The screenshot includes three image slots and prompt text that assigns each image a role—a cowboy, a street, and a T-Rex—then adds camera direction, while the UI exposes 480p and 720p resolution, 6-second and 10-second duration options, and a 9:16 aspect ratio.

A separate cartoon demo shows a user adding a base image and three more references under a "Reference Images" label. That lines up with the claim that stacking references is especially useful when cartoon character identity matters more than generating a new design from scratch.

What creators are making with it

In Bennash's example, seven images of the creator's animals are used as references to push the same subjects into a comic mashup, ending with a Will Smith transformation. The interesting part is not the joke itself but the claim that a small image set is enough to make the animals "do anything," which points to reusable character packs rather than one-off prompts.

Anima Labs frames Grok as the animation-and-sound layer in a wider pipeline that starts with Midjourney or Leonardo for 2D assets and Nano Banana Pro for 3D. The posted clip shows rapid creature morphs and suggests Grok is already being treated as a finishing tool for motion tests, not just a standalone generator.

Where the evidence is strongest — and still thin

A music-video-style post and its follow-up attribution confirm Grok Imagine is also being used for more emotional, edit-driven pieces. But the documentation is thinner there than in the multi-reference demos: the post identifies the tool, while the interface capture and the cartoon workflow are the only items that clearly expose how references are being arranged inside the product.

That makes the current picture fairly specific: Grok Imagine appears strongest, at least in public tests, for short clips, visual mashups, and character-preserving cartoon or creature work built from multiple source images.

TL;DR

What changed in Grok Imagine

What creators are making with it

Where the evidence is strongest — and still thin

Discussion across the web