Qwen-Image-2.0-Pro launches at #9 on Arena with multilingual text rendering
Alibaba launched Qwen-Image-2.0-Pro on ModelScope and API with better prompt adherence, multilingual typography, and steadier style quality. The model is aimed at text-heavy jobs like UI mockups and posters, so test it for layout-heavy generation.

TL;DR
- Alibaba shipped Qwen-Image-2.0-Pro on April 25 via ModelScope and API, and Alibaba_Qwen's launch post framed the release around image quality, multilingual text rendering, instruction following, and more consistent output across styles.
- The most concrete product claim in Alibaba_Qwen's feature thread is better text-heavy generation, with cleaner multilingual typography and mixed-language layouts aimed at posters, UI mockups, and ads.
- According to Alibaba_Qwen's launch post, Qwen-Image-2.0-Pro debuted at #9 worldwide on Arena's text-to-image rankings, and arena's repost amplified that positioning the same day.
- Alibaba_Qwen's examples also claim tighter prompt adherence on multi-object scenes, spatial relationships, and attribute binding, plus more even quality across photorealistic and stylized outputs.
Qwen's launch thread is short, but the payload is pretty specific. The team is pushing this as a model for layout-heavy image work, not just prettier samples: Alibaba_Qwen's multilingual poster example focuses on mixed-language typography, while the subway-phone example is there to show prompt fidelity on a fake screenshot.
Multilingual text rendering
The multilingual rendering claim is the clearest differentiator in the launch materials. Alibaba_Qwen's feature thread says the model improves glyph accuracy, typography consistency, and layout cleanliness in complex compositions, including mixed-language outputs.
That matters mostly because the examples are not generic beauty shots. The attached poster sample in Alibaba_Qwen's feature thread is explicitly aimed at posters, ads, and UI mockups, which is where text rendering failures usually break the image.
Instruction following
Qwen split prompt adherence into its own thread item: Alibaba_Qwen's launch post says the model handles complex compositions better, especially multiple objects, spatial relationships, and attribute binding. The subway image example is presented as a generated phone screenshot rather than a captured photo.
The rest of the thread pairs that with a separate realism claim. In the same evidence bundle surfaced by Alibaba_Qwen's feature thread, Alibaba says texture detail, lighting coherence, and material realism improved across both photorealistic and stylized outputs.
Arena ranking and access
The launch post says Qwen-Image-2.0-Pro opened at #9 worldwide on Arena's text-to-image board, and arena's repost helped circulate that benchmark claim beyond Qwen's own account.
Availability was immediate on two surfaces, per Alibaba_Qwen's launch post: ModelScope for direct use and an API endpoint for developers. The same thread also claims more balanced quality across artistic styles, pitching the release as a steadier generator rather than one that only shines in a narrow aesthetic band.