Google DeepMind released Gemma 4 as four Apache 2.0 models, from mobile-scale E2B and E4B to 31B dense and 26B MoE. Day-0 support in Ollama, vLLM, SGLang, and Arena rankings makes the family ready for local and hybrid agent stacks.

You can read Google's launch post, skim the official model page, and jump straight into the Hugging Face integration post. The weirdly practical part is how complete the launch looks on day one: Ollama commands, a vLLM container recipe, SGLang parser flags, and even an Android Studio local-agent writeup.
Google split the family into two edge models and two workstation models. The naming maps cleanly to deployment targets.
Google's official pages position Gemma 4 as the open counterpart to Gemini-era research, with the launch post calling it the company's most capable open family so far and the model page splitting the line between mobile efficiency and PC-class reasoning.
The product pitch is less about raw chat and more about local agents. Google says Gemma 4 supports native function calling, structured JSON output, native system instructions, multimodal reasoning, and long action histories inside a 256K window on the larger models.
The launch materials repeatedly frame the family around planning and tool use: searching databases, navigating apps, triggering APIs, and offline code generation. The Google Developers edge post makes that even plainer by describing multi-step planning and autonomous action directly on-device.
Hugging Face's integration post adds a couple of engineer-facing details that did not show up as prominently in the tweet thread: support across transformers, llama.cpp, MLX, transformers.js, Mistral.rs, and local-agent integrations, plus architecture notes like per-layer embeddings and a shared KV cache.
Google's own benchmark card pushes the family across reasoning, coding, tool use, multimodal, and long-context tests. The useful bit is how broad the table is.
Arena said the 31B landed at #3 among open models and #27 overall, while the 26B A4B landed at #6 open and #39 overall Arena text leaderboard. Artificial Analysis independently reported 85.7% on GPQA Diamond for the 31B reasoning model, just behind Qwen3.5 27B reasoning, and called out its lower token usage at roughly 1.2M output tokens on that eval Artificial Analysis GPQA results.
The HN thread also filled in some early operator color. One commenter reported the 26B A4B running at roughly 40 tokens per second in a code-agent harness, while another said E4B scored 15 out of 25 on a small SQL benchmark and E2B scored 12 out of 25 in 4-bit quantized form HN comment on code-agent speed HN comment on SQL benchmark.
This launch had the sort of integration coverage open-model users usually wait a week for.
ollama run gemma4:e2b, gemma4:e4b, gemma4:26b, gemma4:31b--reasoning-parser gemma4 and --tool-call-parser gemma4SGLang's launch tweet showed explicit parser flags for Gemma 4 reasoning and tool calls SGLang day-0 support. vLLM followed with a quick-start container example targeting google/gemma-4-31B-it vLLM support. Ollama said Gemma 4 required 0.20+, with a pre-release available later the same day through its GitHub releases page Ollama pre-release note.
The last interesting wrinkle is how hard Google pushed the mobile stack around this release. Phil Schmid's roundup linked Gemma 4 to Android Studio, LiteRT, AI Edge Gallery, Vertex Model Garden, ADK, Cloud Run, GKE Agent Sandbox, MaxText, and vLLM on TPUs Missed details thread.
The Android Studio announcement says developers can use Gemma 4 as a local agent for Android app development offline. The LiteRT-LM overview adds the implementation angle: Android, iOS, web, desktop, and Raspberry Pi support, GPU and NPU acceleration, constrained-decoding function calling, and an AI Edge Gallery app that runs entirely offline.
That makes the E2B and E4B models more than tiny demos. Google is already shipping them as first-class parts of its on-device developer stack.
Discussion around Google releases Gemma 4 open models
1.3k upvotes · 386 comments
Meet Gemma 4: our new family of open models you can run on your own hardware. Built for advanced reasoning and agentic workflows, we’re releasing them under an Apache 2.0 license. Here’s what’s new 🧵 Show more
Gemma 4 is here! 4⃣Our most capable, agentic open model, built on the same research as Gemini 3. ✨ Reasoning. Multimodal. Four sizes (2B to 31B). Base + Instruct. Released under Apache 2.0. Runs on your phone, laptop, or servers. 🧵↓
Build autonomous agents that plan, navigate apps, and execute multi-step tasks – like searching databases or triggering APIs – with native tool use. With up to 256K context, it can analyze full codebases and retain complex action histories without losing focus.
🎉 Congrats on the Gemma 4 launch from @googlegemma, day-0 support is now live in SGLang! Gemma 4 is a multimodal family (4 sizes: E2B, E4B, 26B A4B, and 31B) with both Dense and MoE architectures, built for everything from mobile to server-scale: 👁️ Rich multimodal Show more
Meet Gemma 4! Purpose-built for advanced reasoning and agentic workflows on the hardware you own, and released under an Apache 2.0 license. We listened to invaluable community feedback in developing these models. Here is what makes Gemma 4 our most capable open models yet: 👇
Dig into the Text Arena scores and filter for open source models at: arena.ai/leaderboard/te…
Start building with Gemma 4 now in @GoogleAIStudio. You can also download the model weights from @HuggingFace, @Kaggle, or @Ollama. Find out more → goo.gle/41IC3lY
.@GoogleDeepMind Gemma 4 is here with state-of-the-art models targeting edge and workstations. Requires Ollama 0.20+ that is rolling out. 4 models: 4B Effective (E4B) ollama run gemma4:e4b 2B Effective (E2B) ollama run gemma4:e2b 26B (4B active MoE) ollama run gemma4:26b Show more
Start experimenting with Gemma 4 now in @GoogleAIStudio or download the model weights from @HuggingFace, @Kaggle and @Ollama. Learn more → goo.gle/48ef4TB
Things you might have missed from the Gemma 4 launch today! ⬇️ - You can use Gemma 4 as your agent for building Android apps in Android Studio, offline! android-developers.googleblog.com/2026/04/androi… - You can use LiteRT to load Gemma in Android and iOS. ai.google.dev/edge/litert-lm… - You can download Show more