DiffusionGemma

An experimental open model that explores an exceptionally fast approach to text generation

DiffusionGemma is an experimental open-weights generative model family from Google DeepMind based on the 26B A4B Mixture-of-Experts Gemma 4 architecture. It uses discrete text diffusion to generate blocks of tokens in parallel, supports multimodal text, image, and video inputs, and generates text output with up to a 256K-token context length.

Pricing

Official site · Jul 2, 2026, 7:01 AM

Pricing notes were collected, but there are no normalized numeric fields to display yet.

No public numeric pricing found in first-party sources; no per-use rate is stated on the official announcement page. Recorded as non-normalized because current public pricing appears to be unavailable.

Official Google/Google DeepMind materials do not publish any public usage price for DiffusionGemma. Based on the official product announcement, no numeric per-token, per-image, or subscription pricing could be confirmed; the model appears to be offered as open weights rather than a priced hosted API.

View source

Model Intelligence

Context window

256,000 tokens

Benchmarkable

Model level

family

Recent stories

2 linked stories

releasePRIMARY2026-06-10

Google releases DiffusionGemma 26B-A4B with 4x faster block-based text decoding

Google released Apache 2.0 DiffusionGemma, a 26B-A4B diffusion text model that claims up to 4x faster output by generating text in blocks instead of one token at a time. The release matters for local and hosted stacks that want to test a new decoding path.

releasePRIMARY2026-06-10

vLLM, Unsloth, and llama.cpp add DiffusionGemma support after launch

Google's new diffusion text model picked up same-day runtime support: vLLM added native diffusion-LM serving, Unsloth shipped GGUFs, and llama.cpp got local setup guidance. That shortens the path from release to local and hosted evaluation.