Stories, products, and related signals connected to this tag in Explore.
Google DeepMind released Gemma 4 in E2B, E4B, 26B A4B, and 31B variants with multimodal input, native tool use, and Apache 2.0 licensing. Day-0 support in Ollama, vLLM, SGLang, and Hugging Face puts the models into local and single-GPU workflows now.
A Google bot-authored LiteRT-LM pull request references Gemma4 and AIcore NPU support, while multiple posts claim a largest version around 120B total and 15B active parameters. Engineers targeting on-device inference should wait for a formal model card before locking plans.