releaseApril 16, 2026

BytePlus launches Seedance 2.0 API with multimodal inputs and scene extension

BytePlus launched the Seedance 2.0 API, and creator tests showed image, video, audio, and text inputs, scene extension, voice-synced delivery, and steadier physics. The move brings Seedance from app-only access into repeatable production pipelines and custom workflows.

4 min read

BytePlus launches Seedance 2.0 API with multimodal inputs and scene extension

TL;DR

BytePlus has moved Seedance 2.0 into an official API workflow, and the official ModelArk tutorial says the model now supports text, image, audio, and video inputs, plus video editing and extension, which lines up with ProperPrompter's launch thread and his scene-extension demo.
BytePlus says Seedance 2.0 is built for harder motion and interaction shots, with better physics accuracy, realism, and controllability, according to the Volcano Engine launch post, and creator tests in ProperPrompter's examples and egeberkina's timeline prompt demo push exactly on those claims.
The API matters because it turns Seedance from a UI-only toy into a repeatable pipeline: the official docs describe asynchronous task creation and polling, while minchoi's OpenArt walkthrough and Meshy's model selector show the same model spreading across creator tools.
Prompting style is shifting from one-line descriptors to structured direction, because egeberkina uses second-by-second timelines and AIwithSynthia's beat-synced shot list breaks a 15-second clip into 15 planned shots.

You can read the official tutorial, skim the paper abstract, and even see OpenRouter treat video as just another routed model in its video generation announcement. The weirdly useful part is how much of the control surface is now exposed in public examples: scene extension, timeline-based prompting, and reference-plus-prompt workflows in OpenArt all show the model behaving less like a text box and more like a shot planner.

BytePlus opens the API

BytePlus' tutorial says Seedance 2.0 and Seedance 2.0 Fast share the same core feature set: text-to-video, first-frame and first-and-last-frame image-to-video, multimodal reference-to-video, video editing, video extension, audio generation, and draft mode. The difference is quality versus speed, with the standard model positioned for best output and Fast positioned for cheaper, quicker runs.

The docs also spell out the production shape of the product. Generation is asynchronous, task status is polled every 30 seconds, and inputs are expected as publicly accessible asset URLs, which is a much more pipeline-friendly setup than the app-only workflows that dominated Seedance chatter a week ago.

Scene extension and multimodal reference

The easiest feature to understand is extension. In the official tutorial, BytePlus lists video extension as a first-class capability, and ProperPrompter's example shows a short clip expanding outward into a wider scene while preserving the original character and motion.

The more interesting capability is reference stacking. The docs say Seedance 2.0 can combine image, video, audio, and text, while ProperPrompter's thread shows a character turnaround image merged with a separate chipmunk clip, and egeberkina's voice test reports that an uploaded voice file stayed synced to the referenced character. The Seedance 2.0 paper page describes the model as a unified multimodal audio-video system, which helps explain why these examples feel less bolted together than earlier image-to-video stacks.

Timeline prompts are the real control surface

A lot of the best public Seedance prompts are not descriptive paragraphs. They are shot lists.

Two patterns keep showing up:

Second-by-second timelines: egeberkina structures clips as 0 to 3 seconds, 3 to 6 seconds, 6 to 10 seconds, then fills each beat with camera motion, action, and sound cues.
Beat-synced shot lists: AIwithSynthia maps a 15-second romance clip into 15 discrete shots with environment changes and a mood arc.
Reference locks: minchoi's OpenArt prompt explicitly locks facial features while allowing wardrobe and scene changes.
Motion-first language: in Freepik's workflow thread, the team says image prompts should describe the scene, but video prompts should describe movement, like a camera tracking left as a character turns.

That is the quiet upgrade here. The model is being treated like a tiny previs system, not just a generator for isolated pretty shots.

Where Seedance shows up now

The API launch landed at the same moment Seedance started appearing everywhere else. minchoi's OpenArt post calls out global availability on OpenArt, Meshy added Seedance 2.0 to Image to Video, and stevibe noted that OpenRouter's new video API supports it too. OpenRouter's own launch post confirms Seedance 2.0 was in the day-one model set.

The official launch post adds one more piece that did not show up much in the creator threads: BytePlus says it built portrait and copyright safety checks around the workflow, including face verification, portrait authorization, and a library of more than 10,000 preset virtual human assets for compliant generation. For teams trying to turn Seedance clips into an actual content pipeline, that back-office layer is part of the product now too.

TL;DR

BytePlus opens the API

Scene extension and multimodal reference

Timeline prompts are the real control surface

Where Seedance shows up now

Discussion across the web