releaseMay 9, 2026

Stages AI tests one-click storyboarding for CUE

Stages AI teased one-click storyboarding and said phase one of CUE multimodal vision is complete, with chat-based video analysis and frame retrieval next. The update shifts the tool from shot generation toward planning and analysis in the same workspace.

5 min read

Stages AI tests one-click storyboarding for CUE

TL;DR

dustinhollywood's storyboard teaser says Stages AI is adding one-click storyboarding, and the attached board mockup looks closer to a full episode planning surface than a shot picker.
According to dustinhollywood's CUE vision update, phase one of CUE multimodal vision is done, with chat-based video analysis already working in the media analyzer and frame-level retrieval next on the roadmap.
dustinhollywood's VIDX post frames CUE as the prep layer for script, brief, shot alignment, and bulk generation, including the claim that users will be able to generate hundreds of shots in one click.
The product teasers in a Stages UI update thread and an editing tools preview show the same shift from pure generation toward a combined planning, editing, and analysis workspace.

You can watch the phone demo flip through generated boards, browse the compact CUE chat mockup, and inspect the new clip-properties panel with glass presets that look more like an NLE plug-in than a prompt box. A light mode preview also surfaced a long model list that includes Veo, Kling, Seedance 2, LTX, Mochi, and others, while a multi-shot screen suggests Stages is still building around fast coverage generation rather than single-image prompting. The only official web surface that is easy to confirm from the evidence right now is the Stages AI X account.

CUE multimodal vision

Stages says phase one of CUE's multimodal vision is complete. The attached screen shows a popup chat analyzing an uploaded .mov, extracting a representative frame, then returning a dense breakdown of scene content, mood, cinematography, and visible subjects.

The next step, per that same CUE update, is wiring that chat flow into the existing media analyzer so users can analyze any video, pull any frame, and query it inline instead of leaving the conversation surface.

That matters because the UI teasers now point to one agent surface doing three different jobs:

chat history and tool access in the standalone CUE popup, per the compact chat mockup
frame and clip analysis inside the media analyzer, per the multimodal vision post
story and shot prep before generation, per the VIDX walkthrough

One-click storyboarding

The storyboard teaser is the clearest product reveal in the batch. The image in dustinhollywood's post shows an "ECLIPTIC SEASON 1" board with 80-plus shots, episode labels, handwritten beat notes, and a bottom strip of key character stills.

Between the board screenshot and the short phone video, the feature looks less like auto-layout and more like an attempt to turn generated shots into an editable preproduction document. The visible board components are concrete:

episode sections, including Ep. 3 and Ep. 4, in the storyboard image
numbered shot panels running past 80, in the same board
handwritten production notes such as "END ACT ONE" and FX annotations, in the annotations
key character and moment stills collected at the bottom, in the stills strip

That is a bigger ambition than "generate coverage." It pushes Stages toward the place where script breakdown, look development, and shot planning all touch the same canvas.

Shot generation and editing stack

Stages is still pitching scale. In the VIDX post, Dustin says CUE can turn loaded images or video into dialogue by script or brief, prep shots to match motion and style, then either add them to a timeline or generate "500 shots in one click."

The surrounding teasers fill in what that stack looks like:

multi-shot and multi-angle generation from one source frame, in the multi-shot UI
character-consistent story shot generation, in the character consistency preview
transition design with continuity checks, in the Bridge Composer screen
speed-ramp and sound-curve editors, in the editing modal previews
clip-level effect presets like Spectre, Phantom, Shard, Vapor, Fracture, and Pulse, in the clip properties panel

Taken together, the workflow Stages is teasing is not text-to-video in the narrow sense. It is shot orchestration with generation, continuity, transitions, and finishing controls in one stack.

Rollout clues

The product philosophy is unusually explicit. In dustinhollywood's post, he says the target is "creative automation" that removes repetitive labor while keeping the artist in control, with agents handling heavy lifting and tools passing data between each other.

That framing shows up in the release hints:

the VIDX announcement says VIDX would open to all Stages users on Monday
the multimodal vision post says more artists would be let into the "100 program" the following week
a build update claims the team closed 94 PRs in a day during this sprint

One last useful clue sits in the light mode preview, which exposes a model picker listing Veo, Kling, Grok Imagine, WAN, MiniMax/Hailuo, Luma Labs, VIDU, Seedance 2, LTX, and Mochi. If that picker reflects the shipping product, Stages is not building around a single in-house generator. It is building a creative control layer over a growing vendor stack.

TL;DR

CUE multimodal vision

One-click storyboarding

Shot generation and editing stack

Rollout clues

Discussion across the web