Our most capable open models
Google DeepMind's Gemma is a family of open models. The official Gemma pages describe it as their most capable open models and, via Gemma 3, as a family of lightweight models with multimodal understanding for applications that run across cloud servers, laptops, and phones.
Gemma open models are described by Google as freely available for use/open weights. The official documentation reviewed does not publish a standalone usage fee for Gemma; any Google Cloud infrastructure used to self-deploy or serve the model is billed separately.
Google's official docs describe Gemma as an open-model family and state that Gemma models are freely available for use. I did not find a separate public per-token or subscription price for Gemma itself on the official pages reviewed.
Google DeepMind released Gemma 4 in E2B, E4B, 26B A4B, and 31B variants with multimodal input, native tool use, and Apache 2.0 licensing. Day-0 support in Ollama, vLLM, SGLang, and Hugging Face puts the models into local and single-GPU workflows now.
A Google bot-authored LiteRT-LM pull request references Gemma4 and AIcore NPU support, while multiple posts claim a largest version around 120B total and 15B active parameters. Engineers targeting on-device inference should wait for a formal model card before locking plans.