Seedance 2.0 benchmarks 15s dialogue shots at about $4.50
David Comfort tested Seedance 2.0 with Seed Audio for three-character lip-synced continuous shots at about $4.50 per 15-second video. Use one blocking event per shot, master-derived close-ups, and lock clauses to improve handoffs.

TL;DR
- The cost post priced a 15-second 720p Seedance 2.0 dialogue shot at about $4.50 for video, plus about $0.07 for a staged still and $0.10 for conversation audio; fal's Seedance API docs list 720p generation at $0.3034 per second, which lands at $4.55 for 15 seconds.
- The workflow thread used one Seed Audio file to drive two or three speaking mouths in one continuous Seedance shot, with no shot/reverse-shot stitching or separate lip-sync pass per character.
- The two-day test found the annoying split: Seed Audio with reference voices preserved cloned dialogue but wiped sound effects and ambience, so the workflow ran dialogue and sound design as separate passes.
- The blocking test found a one-event movement budget per shot, while the handoff test got a physical vial pass to survive by using one decisive motion and a hold-steady clause.
- The 4K cost post put a 15-second Seedance 2.0 4K clip at $47, a separate budget tier from the 720p dialogue workflow.
fal's Seedance 2.0 docs list 4 to 15 second outputs, 480p through 4K resolution, and a generate_audio option for synchronized sound, ambience, and lip-synced speech at the same video-generation cost. fal's Seed Audio docs accept @Audio1, @Audio2, and @Audio3 reference clips, while BytePlus' Seedance page says Seedance 2.0 Mini and 4K are now available. The weird field report is in the workflow thread: reference voices can be the thing that kills the rest of the mix.
The 15-second dialogue stack
The rough stack came from the workflow thread: generate one conversation file in Seed Audio, stage the actors in one still, then feed both into Seedance 2.0 as a reference-to-video shot.
DavidmComfort's reported recipe:
- Generate the whole conversation as one Seed Audio file, with speaker parentheticals and reference voices.
- Run cheap transcript QC before video generation.
- Create one staged still with the full geography locked.
- In the Seedance prompt, write the exact dialogue again and name who speaks each line.
- Keep the clip inside Seedance's 15-second ceiling.
The cost post priced the per-shot 720p stack at about $4.50 for video, $0.07 for the staged still, and $0.10 for conversation audio. The whole two-day investigation, including failed experiments and retries, came to about $65.
The $4.50 video number lines up with fal's Seedance pricing docs, which list 720p generation with audio at $0.3034 per second.
The audio split
The Magnific launch post advertised Seed Audio 1.0 for voice, music, and SFX from a single prompt, with scene control over environment, voice, emotion, music, and sound effects. The same post said it supports multi-character dialogue and up to three reference audios or an image, and is available through Magnific's Voice Generator, MCP, and Spaces.
The caveat came from the workflow thread: when reference voice clips were attached, Seed Audio returned dialogue only in five attempts. The same prompt without reference voices returned dialogue, ambience, and timed foley in one generation.
Comfort's split was mechanical:
- Seed Audio with references for dry cloned dialogue.
- Seedance 2.0
generate_audiofor scene ambience. - A second Seed Audio pass when sound effects or music needed to be designed outside the video model.
fal's Seed Audio guide says Seed Audio 1.0 can generate multi-character dialogue, sound effects, and background music in a single pass, supports English and Chinese, accepts up to three 30-second reference clips, and can generate up to two minutes per pass. fal's API page adds that reference audio inputs are called inside prompts as @Audio1, @Audio2, and @Audio3.
The try-now reply pointed users to Magnific's product link, the offer clarification said one offer was for new users, the timing reply called the release right on time, and the teaser reply hinted at another surprise without naming it.
Camera travel
The cleanest camera note in the travel rule is that the 180-degree rule governs cuts, not continuous camera movement. The model got into trouble when prompted to teleport to a new angle, not when prompted to travel there.
The axis note made the three-character version specific: with three actors, the relevant axis is between whoever is speaking now. When dialogue hands off to a new pair, the axis rotates with the conversation.
The tested camera move in the arc post ran as one continuous shot with zero scene cuts, moving from wide to two-shot to close-up and arriving on the final speaker on cue. The caveat: a requested arc of less than 30 degrees came back as an arc plus a push, so restraint had to be written into the framing language.
The arc prompt shows the format: source frame first, character positions by frame side, exact spoken lines, then a camera instruction tied to the final line.
Blocking budget
The blocking test found a hard movement budget: one blocking event per shot. A prompt with one character crossing to the table and another countering a step executed the cross and dropped the counter.
The working pattern was narrow:
- One camera move.
- One blocking event.
- Everyone else performs in place.
- Movement happens in silence when possible.
The cross prompt put Liora's movement in the beat of silence after Kael's first line, with the camera locked at eye level. The note attached to the prompt says Tobin's counter was the clause Seedance dropped.
Hand-to-hand object pass
Hands were the stress test. The handoff phrasing got a close-up object pass to survive by reducing it to one action and one lock: Liora presses the vial into Kael's palm, his fingers close, then both hands hold steady.
The handoff render prompt kept the camera locked, kept Tobin just off frame-left, and described the vial as a single rigid object after the transfer. The useful phrase is boring, which is probably why it worked.
Master-derived coverage
The close-up mistake came from generating a fresh text-to-image close-up: wrong faces, wrong room, and the third character disappeared. The fix was to derive coverage from the master wide through image-to-image, while naming the absent character's off-frame position.
The derived close prompt used a tighter two-shot from the same camera position, identical faces, same chamber, same fires, same stone table edge, and Tobin just off frame-left.
The off-frame speaker test found that a speaker can keep talking after the camera leaves him behind. The voice continued, a hand stayed at frame edge, and the model did not invent a duplicate or cut back.
The staging cheat sheet reduced the setup to theater blocking:
- Three characters form a triangle, not a flat line.
- Profile face-off reads as confrontation.
- Three-quarter stagger reads warmer.
- The actor nearer the camera reads dominant.
- The actor who moves owns the beat.
- Absent characters live just off-frame where the viewer already placed them.
4K, Mini, and platform routes
The 720p dialogue stack is not the only Seedance 2.0 route creators are testing. The 4K cost post priced a 15-second 4K clip at $47 and used an early-2000s camcorder prompt with autofocus hunting, exposure pumping, school buses, lockers, and an abrupt cut to black.
The Mini pricing post said Seedance 2.0 Mini was available on that platform starting at $0.07 per second. BytePlus' product page also names Seedance 2.0 Mini and 4K as available variants.
Platform wrappers are turning the stack into buttons. The BeatBandit integration post said BeatBandit defines or generates voices for each character, creates Seed Audio clips per shot, drives Seedance with those clips, writes the prompts, and attaches the relevant files. The BeatBandit reply added that Seedance can combine video and audio references directly, with the model adjusting one or both if sync is off.
The CapCut 4K post said Seedance 2.0 4K was used inside CapCut Video Studio for a weekly Nexus episode while a feature film was starting production. The Magnific character-sheet post showed a front, side, and back character sheet carrying helmet texture into motion, and the Magnific lip-sync workflow post paired Kling Motion Control with Seedance 2.0 for dynamic lip sync.