releaseMay 31, 2026

Grok Imagine 1.5 drops on fal: creator posts compare its video motion against Seedance 2.0

fal added Grok Imagine Video 1.5, and creator posts immediately tested it against Seedance 2.0 and Gemini Omni on fight scenes, lip-sync, and reference-driven clips. The early comparisons put it into the serious creator model mix, but not clearly ahead of Seedance in real-world use.

4 min read

Grok Imagine 1.5 drops on fal: creator posts compare its video motion against Seedance 2.0

TL;DR

fal added Grok Imagine Video 1.5 on its model hub, exposing xAI's newer image-to-video model with audio, 1 to 15 second clips, and default 720p output, while gokayfem's repost marked the drop as live on May 31.
The fastest creator-side comparison came from ozansihay's fight-scene test, which ran the same prompt across Grok Imagine 1.5, Gemini Omni, Seedance 2.0 Fast, and Seedance 2.0 Pro to compare reciprocal motion and timing.
Early sentiment was mixed, not triumphalist: although the Image-to-Video Arena leaderboard shows Grok climbing, ozansihay's ranking reaction said Seedance 2.0 still has a noticeable edge in real-world use.
Creator posts from carolletta's coffee clip and carolletta's later post pushed Grok Imagine into softer, character-led tests, while xAI's docs show the preview model priced at $0.08 per output second with a 60 RPM limit.

You can check the fal model page, skim xAI's preview docs, and compare that positioning against Arena's public leaderboard. xAI's broader video workflow docs also show where the model sits inside a bigger stack that now covers text-to-video, image-to-video, editing, reference-driven generation, and extension.

fal's Grok endpoint

fal's listing frames Grok Imagine 1.5 as an image-to-video endpoint that generates audio too, with the API exposed at xai/grok-imagine-video/v1.5/image-to-video on fal. The page tags it for stylized, transform, and lipsync workflows, which matches the kinds of clips creators immediately started posting.

The official xAI preview docs describe the model as grok-imagine-video-1.5-preview, with text and image to video modalities, the alias grok-imagine-video-1.5-2026-05-30, and availability in us-east-1 and eu-west-1.

Fight-scene motion

Ozan Sihay's side-by-side is the most useful same-prompt comparison in the evidence pool because it keeps the scene fixed and lets motion quality do the talking. He tested back-and-forth action across four models: Grok Imagine 1.5, Gemini Omni, Seedance 2.0 Fast, and Seedance 2.0 Pro.

Grok Imagine 1.5 fight-scene output
Seedance 2.0 Fast fight-scene output

That test landed next to ozansihay's ranking reaction, where he said Grok is progressing quickly but still does not beat Seedance 2.0 in real-world use. That is a sharper read than the leaderboard alone, especially because Arena's ranking page rewards preference votes, not one creator's workflow priorities.

Lip-sync and everyday character clips

The early user examples are less about cinematic benchmarking and more about whether the model can sell a quick social post. Carolletta's two clips lean into character presence and light performance, which is where bad video models usually fall apart first.

Carolletta's Grok Imagine coffee clip

The fal page's lipsync tag is doing real work here. Ozan's broader May roundup, ozansihay's model categories post, put Grok Imagine and Gemini Omni in his Turkish lip-sync shortlist, while he still gave general video generation to Seedance 2.0 and Kling 3.0.

Reference workflows and pricing

xAI's current video generation docs show five separate workflows in the same stack:

Video Generation, text prompt only.
Image-to-Video, prompt plus a starting image.
Video Editing, modify an existing video.
Reference-to-Video, guide generation with one or more reference images.
Video Extension, continue from a video's last frame.

The reference-to-video docs make the most creator-relevant distinction: reference images guide who or what appears in the clip without locking the first frame, which is useful for character consistency, product placement, and virtual try-on.

On cost, xAI's preview page lists output at $0.08 per second, while fal's hosted page breaks that out to $0.08 per second at 480p and $0.14 at 720p, adds $0.01 per input image, and says audio is included. The same pages set a 60 requests per minute limit and a default clip length of 6 seconds, with a 1 to 15 second range.

TL;DR

fal's Grok endpoint

Fight-scene motion

Lip-sync and everyday character clips

Reference workflows and pricing

Discussion across the web