workflowMay 14, 2026

PJ Accetturo reports Kavan's Chronicles of Bone workflow uses 360 set maps and black-video lip sync

PJ Accetturo broke down Kavan's Chronicles of Bone process across Magnific and Seedance, including black-video voice templates, 360 set maps, and foley-first post. It matters because character, set, lip-sync, and action consistency are being treated as repeatable production steps.

5 min read

PJ Accetturo reports Kavan's Chronicles of Bone workflow uses 360 set maps and black-video lip sync

TL;DR

PJaccetturo's asset-bible breakdown says Kavan starts by building character sheets before any video, including wide, medium, and close-up variants plus locked identifiers like white eyes or scars.
According to PJaccetturo's black-video voice post, Kavan keeps voices consistent by turning clean audio into black-screen reference videos, then pairing that voice print with typed dialogue for emotion.
PJaccetturo's set-map post reports that each location gets four master angles, a clean fly-through, and a screenshot-upscale loop that turns moving shots into reusable location references.
PJaccetturo's directing template and PJaccetturo's eyeline post frame production as repeatable direction work: fixed cinematic suffixes, reference images for blocking, and placeholder bodies to lock gaze direction.
PJaccetturo's post-production post and PJaccetturo's sound-design post say the finished scenes come from 20 to 50 generations, clipped into usable fragments, then glued together with custom music and foley-first post.

You can watch PJ Accetturo's 90-second recap, scan the full directing prompt, and see how the 360-degree room map turns continuity into a reference problem instead of a guessing game. The oddest trick is still the black-screen lip-sync hack, while kaigani's roundup shows The Chronicles of Bone already circulating in AI video curation outside the original thread.

Asset bibles

PJ Accetturo's thread turns Kavan's process into a pre-production system, and the useful part is how much work happens before motion starts.

The character pack has three required views:

Wides for blocking.
Mediums for interaction.
Close-ups for textures and markings.

According to PJaccetturo's Phase 1 post, Kavan also writes explicit visual identifiers into prompts so the model has fewer chances to drift between shots. The most concrete detail is physical reference capture: that same post says he photographed himself wearing a hand-carved mask for the Last Lost Boy, then used the image as an image-to-image base.

Black-video voice prints

The voice workflow is the sharpest technical trick in the thread.

According to PJaccetturo's audio post, the stack has five moving parts:

Seedance generates the first voice pass.
The keeper clip becomes a reusable voice print.
Audio gets wrapped in a solid black video instead of staying as an MP3.
The reference clip stays clean, with no background noise.
The typed line carries emotion, while the black video carries identity.

PJaccetturo's Phase 2 post also claims Seedance caps reference videos at 15 seconds total, so multi-character scenes get trimmed to roughly 3 to 5 seconds per character. That constraint makes the workflow feel less like open-ended prompting and more like casting with strict shot budgets.

360 set maps

Kavan handles environment consistency the same way he handles characters: build references once, reuse them everywhere.

The location workflow in PJaccetturo's set-map post is unusually specific:

Generate four master angles for each set: north, south, east, west.
Create a clean 1080p fly-through before adding characters.
Upscale the best frames.
Screenshot those frames.
Re-upload them as high-detail location references for later shots.

That is the part most creators will probably save. PJaccetturo's post describes continuity as a world-building asset pipeline, not a prompt-writing problem.

Directing templates

The production prompts are built around a fixed cinematic suffix, then changed only at the camera-move level.

According to PJaccetturo's directing post, the reusable ending is some variation of "Masterful style fantasy film. The lighting and cinematography are masterful and expertly accomplished but slow and deliberate," while handheld, drone, or dolly swaps supply the motion difference from shot to shot.

PJaccetturo's blocking post adds two smaller but very practical controls:

Generate a placeholder body first when a character needs to look at someone or something specific.
Use short nicknames for characters so image and voice references stay easier for the model to tag.

Action prompts and edit mentality

The thread's action advice is blunt: prompt the impact, not the forbidden noun.

According to PJaccetturo's safety-filter post, Kavan describes cinematic violence prompts in verbs, not in words like blood or gore. The example in that post contrasts "stabs in chest" and "bursts out of his eyeball" with the banned-result vocabulary he avoids.

That same production mindset carries into editing. PJaccetturo's post-production post says Kavan treats generations like live-action takes, often running 20 to 50 versions of a hard action beat, then pulling out the few usable seconds and stitching them in Premiere or Resolve.

Foley-first post

The last step is less glamorous and probably more important than the thread headline suggests.

According to PJaccetturo's sound post, Kavan prompts for natural sound only, keeps AI music out of the generation, and uses the clean output to capture fire crackle, footsteps, and sword impacts. Music and atmospheric layers get added later in post.

That detail lands differently next to kaigani's roundup, which included The Chronicles of Bone in a weekly top-five AI video list. The workflow in this thread is not just about getting one good clip. It is about making enough reusable picture, voice, set, and sound components that a series can keep its look from scene to scene.