3DreamBooth is a new multi-view reference method for subject-driven video that claims about 50% better 3D geometric fidelity than 2D baselines. It matters for product shots, virtual production, and character turnarounds where camera moves usually break identity.

3DreamBooth trains a subject as a 3D entity instead of a flat image token. In the project summary, the researchers describe a setup that learns spatial geometry from limited multi-view photos, then injects that geometry into generation so shape and texture stay stable across viewpoint changes.
The same summary says the method was built on Hunyuan Video and also demonstrated on WanVideo 2.1 at 720p, which matters for creators already working across different video stacks. The linked write-up details here frames the main win as fewer identity collapses during camera moves, rotations, and product-style multi-angle shots.
The creative angle is straightforward: this is aimed at the ugly failure case in subject-driven video where a prop, product, or character looks right front-on, then drifts off-model as soon as the shot turns. The launch thread points to virtual production, e-commerce visualization, and VR/AR as the clearest fits because those workflows depend on believable multi-angle continuity.
A supporting repost describes the system as enabling multi-view generation by treating subjects as 3D objects rather than 2D references supporting repost. That does not prove production readiness, but it does make 3DreamBooth more relevant for turntables, hero product shots, and character look-dev than a typical image-conditioned video demo.
Yonsei and Sungkyunkwan Universities release 3DreamBooth. 3D subject-driven video generation from multi-view reference images. Creates view consistent videos maintaining identity across angles. Learns spatial geometry via 1 frame optimization. Outperforms 2D baselines by 50% onΒ Show more