Tencent releases HY-World 2.0 with persistent 3D world export
Tencent released HY-World 2.0 with WorldMirror 2.0 code and weights for turning text, images, or video into persistent 3D scenes. The output includes navigable geometry and camera data instead of disposable video frames.

TL;DR
- Tencent's HY-World 2.0 repo frames the project as a multimodal 3D world model, and _akhaliq's paper post points to the technical report behind it.
- The released piece that matters right now is WorldMirror 2.0, which hasantoxr's WorldMirror note says predicts depth, normals, camera parameters, point clouds, and 3D Gaussian Splatting attributes in one forward pass, matching the official README.
- HY-World 2.0 is built as a four-stage pipeline, where hasantoxr's architecture post lines up with the tech report: HY-Pano 2.0, WorldNav, WorldStereo 2.0, then WorldMirror 2.0.
- The creative shift is that HY-World 2.0 outputs persistent 3D assets, not just disposable video, a claim that hasantoxr's persistent-worlds post makes directly and the official model card backs with mesh and 3DGS export language.
- Only part of the stack is open today: according to hasantoxr's release-status post and the official Hugging Face page, WorldMirror 2.0 code and weights are live now, while HY-Pano 2.0, WorldNav, WorldStereo 2.0, and the full generation inference code are still marked coming soon.
You can browse the repo, jump straight to the model card, and read the full technical report. The official page also says the scenes can be imported into Blender, Unity, Unreal, and Isaac Sim, while the public demo on Tencent's scene-to-3D site already shows first-person and third-person exploration.
WorldMirror 2.0
The easiest way to read this release is as a very usable 3D reconstruction drop hiding inside a bigger world-model announcement.
The official README says WorldMirror 2.0 handles world reconstruction from multi-view images or video, and does it with one feed-forward model instead of a stitched pipeline of separate estimators. The output bundle is unusually complete for creator workflows:
- depth maps
- surface normals
- camera parameters
- 3D point clouds
- 3D Gaussian Splatting attributes
That list comes from both hasantoxr's WorldMirror note and Tencent's README. The older HunyuanWorld-Mirror repo also describes the model as a universal geometric predictor with Gradio demo support and local inference instructions.
Four-stage pipeline
HY-World 2.0 does not try to hallucinate a whole 3D world in one shot. The paper and hasantoxr's architecture post both break it into four named stages.
- HY-Pano 2.0: generates the panoramic seed scene.
- WorldNav: plans trajectories through that scene.
- WorldStereo 2.0: expands missing geometry with a memory-aware keyframe process.
- WorldMirror 2.0: composes the final world representation and 3DGS outputs.
The technical report adds the key design idea: each stage owns one job, from panorama fidelity to trajectory planning to world composition. That modularity is the interesting part for creative tooling, because it implies Tencent is treating world generation more like a production pipeline than a single magic model.
Persistent 3D assets
The repo's strongest pitch is not benchmark language. It is the format of the output.
Tencent's README explicitly contrasts HY-World 2.0 with video world models such as Genie 3 and Cosmos. Instead of pixel sequences that end when playback ends, the system produces meshes and Gaussian splats that the company says can be imported into Blender, Unity, Unreal Engine, and Isaac Sim.
That matters because the system is trying to cross three categories at once:
- text or single image to world generation
- multi-view images or video to world reconstruction
- interactive exploration inside the rendered scene
The model card lists text, single-view images, multi-view images, and videos as supported inputs. _akhaliq's paper post includes a demo clip that shows exactly the kind of wireframe-to-rendered-world transitions Tencent is pushing.
WorldStereo 2.0's navigation numbers
Tencent has not opened WorldStereo 2.0 yet, but it is the part of the stack tied most directly to navigability.
According to hasantoxr's WorldStereo metrics post, rotation error dropped from 0.762 degrees to 0.492 degrees, while translation error fell from 1.245 meters to 0.968 meters. In plain terms, Tencent is claiming tighter virtual camera motion, which is a prerequisite for scenes you can move through without the geometry turning to mush.
The technical report describes WorldStereo 2.0 as a keyframe-based world expansion model with consistent memory. That is a very different bet from the short-clip video systems creative teams have been poking at for previz.
What is open now
The open-source story is narrower than the launch headline.
Tencent's Hugging Face page says the technical report and partial code shipped on April 15, and that WorldMirror 2.0 inference code and weights are available now. The same page marks the full HY-World 2.0 generation inference code, HY-Pano 2.0 weights and code, WorldNav code, and WorldStereo 2.0 weights and inference code as coming soon.
The install story is also fairly practical already. The older HunyuanWorld-Mirror README recommends CUDA 12.4, Python 3.10, PyTorch 2.4.0, gsplat for rendering, and offers both a hosted demo and local Gradio app.
License and demo status
Tencent is shipping the model under its own community terms, not a standard Apache or MIT license.
The HunyuanWorld-Mirror license says the agreement excludes the European Union, the United Kingdom, and South Korea. That matches the vibe of hasantoxr's release-status post, which called it Apache-adjacent rather than Apache.
The official scene-to-3D product page, which the HY-World 2.0 README links as a free try, also advertises first-person and third-person exploration. Tencent's README adds one small reality check right in the badge text: the demo is "Very Crowded Now, Be Patient."