Dreamverse paired Hao AI Lab's FastVideo stack with an interface for editing video scenes in a faster-than-playback loop, using quantization and fused kernels to keep latency below viewing time. The stack is interesting if you are building real-time multimodal generation or multi-user video serving.

Dreamverse is a prototype interface on top of FastVideo that aims to make video generation interactive instead of asynchronous. Hao AI Lab’s launch thread frames the change against current systems that “take minutes” for a 5-second 1080p clip, while Dreamverse is presented as a live loop where users can keep steering the same scene as outputs come back.
The workflow is deliberately short: “Generate a clip → watch it → edit,” and the workflow post gives concrete examples such as “Slow the camera” and “Change the background.” That matters because the system is not described as one-shot prompt generation; it is positioned as scene iteration with continuity across revisions. The public demo is available via the Dreamverse app, and Hao’s blog post describes this as “vibe directing” rather than prompt-and-wait generation.
Hao attributes the speed to a new real-time inference stack inside FastVideo. In the team’s technical thread, the named ingredients are fast attention backends, 4-bit quantization, fused kernels, and “optimized multi-user serving,” which is the most deployment-relevant detail in the announcement because it suggests the work is not only about a single offline benchmark run.
The practical bar here is unusual: generation has to stay below playback time so the “creative loop stays alive,” in Hao’s technical thread phrasing. That makes Dreamverse interesting beyond video UX. If the claim holds under load, the same stack design points toward real-time multimodal apps where responsiveness matters more than maximizing per-clip quality, especially for serving setups that need iterative edits instead of long queued renders.
(1/N) We're launching Dreamverse. Most AI video models take minutes to generate a 5 s 1080p clip. In 4.5 seconds, we can generate 30 s 1080p clips on a single GPU. Our videos generate faster than you can watch them: stop waiting on prompts and start directing scenes live. Show more
(3/N) Under the hood, this runs on our new real-time inference stack in FastVideo (our open-source video model post-training/inference framework): • fast attention backends • 4-bit quantization • fused kernels • optimized multi-user serving • and much more 🤫 Fast enough Show more