DiffusionGemma
An experimental open model that explores an exceptionally fast approach to text generation
DiffusionGemma is an experimental open-weights generative model family from Google DeepMind based on the 26B A4B Mixture-of-Experts Gemma 4 architecture. It uses discrete text diffusion to generate blocks of tokens in parallel, supports multimodal text, image, and video inputs, and generates text output with up to a 256K-token context length.
Pricing
No public numeric pricing found in first-party sources; no per-use rate is stated on the official announcement page. Recorded as non-normalized because current public pricing appears to be unavailable.
Official Google/Google DeepMind materials do not publish any public usage price for DiffusionGemma. Based on the official product announcement, no numeric per-token, per-image, or subscription pricing could be confirmed; the model appears to be offered as open weights rather than a priced hosted API.
Model Intelligence
Recent stories
Google released Apache 2.0 DiffusionGemma, a 26B-A4B diffusion text model that claims up to 4x faster output by generating text in blocks instead of one token at a time. The release matters for local and hosted stacks that want to test a new decoding path.
Google's new diffusion text model picked up same-day runtime support: vLLM added native diffusion-LM serving, Unsloth shipped GGUFs, and llama.cpp got local setup guidance. That shortens the path from release to local and hosted evaluation.