releaseApril 2, 2026

Gemma 4 releases 31B dense and 26B MoE models under Apache 2.0

Google DeepMind released Gemma 4 as four Apache 2.0 models, from mobile-scale E2B and E4B to 31B dense and 26B MoE. Day-0 support in Ollama, vLLM, SGLang, and Arena rankings makes the family ready for local and hybrid agent stacks.

5 min read

Gemma 4 releases 31B dense and 26B MoE models under Apache 2.0

TL;DR

Google DeepMind released Gemma 4 as four Apache 2.0 models, E2B, E4B, 26B A4B MoE, and 31B dense, spanning phones to single-GPU workstation deployments Google DeepMind launch thread Phil Schmid launch summary.
The family is built for local agents, with native function calling, structured JSON output, multimodal support, and up to 256K context on the larger models Google DeepMind on native tool use Phil Schmid launch summary.
Google is pitching the 31B and 26B models as unusually efficient for their class, with Arena ranking the 31B at #3 among open models and Artificial Analysis putting the 31B at 85.7% on GPQA Diamond Arena ranking Artificial Analysis GPQA results.
Day-0 ecosystem support landed fast: Gemma 4 showed up in Ollama, SGLang, vLLM, Hugging Face, Google AI Studio, Kaggle, and Ollama's own library page on launch day Availability tweet Ollama support SGLang support vLLM support.
The smaller edge models are the sleeper feature, they bring vision and audio on-device, and Google already wired them into Android Studio, LiteRT-LM, AI Edge Gallery, and AICore preview flows for fully offline use Phil Schmid launch summary Missed details thread.

You can read Google's launch post, skim the official model page, and jump straight into the Hugging Face integration post. The weirdly practical part is how complete the launch looks on day one: Ollama commands, a vLLM container recipe, SGLang parser flags, and even an Android Studio local-agent writeup.

Model sizes

Google split the family into two edge models and two workstation models. The naming maps cleanly to deployment targets.

E2B and E4B: edge-first models for phones, mobile apps, and IoT, with vision and audio support
26B A4B: a MoE model with 26B total parameters and 4B active parameters
31B: the dense flagship
Context: 128K on E2B and E4B, 256K on 26B A4B and 31B, according to Google and third-party launch summaries
Languages: trained on 140+ languages
License: Apache 2.0 across the family

Google DeepMind

@GoogleDeepMind

·Follow

Meet Gemma 4: our new family of open models you can run on your own hardware. Built for advanced reasoning and agentic workflows, we’re releasing them under an Apache 2.0 license. Here’s what’s new 🧵 Show more

Watch on X

4:03 PM · Apr 2, 2026

6.4K

Read 258 replies

Google's official pages position Gemma 4 as the open counterpart to Gemini-era research, with the launch post calling it the company's most capable open family so far and the model page splitting the line between mobile efficiency and PC-class reasoning.

Agentic workflows

The product pitch is less about raw chat and more about local agents. Google says Gemma 4 supports native function calling, structured JSON output, native system instructions, multimodal reasoning, and long action histories inside a 256K window on the larger models.

Google DeepMind

@GoogleDeepMind

·Follow

Replying to @GoogleDeepMind

Build autonomous agents that plan, navigate apps, and execute multi-step tasks – like searching databases or triggering APIs – with native tool use. With up to 256K context, it can analyze full codebases and retain complex action histories without losing focus.

Watch on X

4:03 PM · Apr 2, 2026

574

Read 13 replies

LMSYS Org

@lmsysorg

·Follow

🎉 Congrats on the Gemma 4 launch from @googlegemma, day-0 support is now live in SGLang! Gemma 4 is a multimodal family (4 sizes: E2B, E4B, 26B A4B, and 31B) with both Dense and MoE architectures, built for everything from mobile to server-scale: 👁️ Rich multimodal Show more

Google Gemma

@googlegemma

Meet Gemma 4! Purpose-built for advanced reasoning and agentic workflows on the hardware you own, and released under an Apache 2.0 license. We listened to invaluable community feedback in developing these models. Here is what makes Gemma 4 our most capable open models yet: 👇

4:16 PM · Apr 2, 2026

Benchmarks

Google's own benchmark card pushes the family across reasoning, coding, tool use, multimodal, and long-context tests. The useful bit is how broad the table is.

Reasoning: AIME 2026, GPQA Diamond, BigBench Extra Hard
Coding: LiveCodeBench v6, Codeforces Elo
Tool use: Tau2 and t2-bench
Multimodal: MMMU Pro, OmniDocBench, Math-Vision, MedXPertQA MM
Long context: MRCR v2 8-needle at 128K

Arena.ai

@arena

·Follow

Replying to @arena

Dig into the Text Arena scores and filter for open source models at: arena.ai/leaderboard/te…

4:19 PM · Apr 2, 2026

Read 1 reply

Arena said the 31B landed at #3 among open models and #27 overall, while the 26B A4B landed at #6 open and #39 overall Arena text leaderboard. Artificial Analysis independently reported 85.7% on GPQA Diamond for the 31B reasoning model, just behind Qwen3.5 27B reasoning, and called out its lower token usage at roughly 1.2M output tokens on that eval Artificial Analysis GPQA results.

The HN thread also filled in some early operator color. One commenter reported the 26B A4B running at roughly 40 tokens per second in a code-agent harness, while another said E4B scored 15 out of 25 on a small SQL benchmark and E2B scored 12 out of 25 in 4-bit quantized form HN comment on code-agent speed HN comment on SQL benchmark.

Day-one serving stack

This launch had the sort of integration coverage open-model users usually wait a week for.

Google AI Studio: browser access on day one
Hugging Face and Kaggle: weights available immediately
Ollama: ollama run gemma4:e2b, gemma4:e4b, gemma4:26b, gemma4:31b
SGLang: --reasoning-parser gemma4 and --tool-call-parser gemma4
vLLM: launch-day OpenAI-compatible container image for Gemma 4

Google DeepMind

@GoogleDeepMind

·Follow

Replying to @GoogleDeepMind

Start building with Gemma 4 now in @GoogleAIStudio. You can also download the model weights from @HuggingFace, @Kaggle, or @Ollama. Find out more → goo.gle/41IC3lY

Watch on X

4:06 PM · Apr 2, 2026

463

Read 6 replies

ollama

@ollama

·Follow

.@GoogleDeepMind Gemma 4 is here with state-of-the-art models targeting edge and workstations. Requires Ollama 0.20+ that is rolling out. 4 models: 4B Effective (E4B) ollama run gemma4:e4b 2B Effective (E2B) ollama run gemma4:e2b 26B (4B active MoE) ollama run gemma4:26b Show more

Google

@Google

Start experimenting with Gemma 4 now in @GoogleAIStudio or download the model weights from @HuggingFace, @Kaggle and @Ollama. Learn more → goo.gle/48ef4TB

Watch on X

4:14 PM · Apr 2, 2026

2.9K

Read 77 replies

SGLang's launch tweet showed explicit parser flags for Gemma 4 reasoning and tool calls SGLang day-0 support. vLLM followed with a quick-start container example targeting google/gemma-4-31B-it vLLM support. Ollama said Gemma 4 required 0.20+, with a pre-release available later the same day through its GitHub releases page Ollama pre-release note.

Android Studio and LiteRT-LM

The last interesting wrinkle is how hard Google pushed the mobile stack around this release. Phil Schmid's roundup linked Gemma 4 to Android Studio, LiteRT, AI Edge Gallery, Vertex Model Garden, ADK, Cloud Run, GKE Agent Sandbox, MaxText, and vLLM on TPUs Missed details thread.

The Android Studio announcement says developers can use Gemma 4 as a local agent for Android app development offline. The LiteRT-LM overview adds the implementation angle: Android, iOS, web, desktop, and Raspberry Pi support, GPU and NPU acceleration, constrained-decoding function calling, and an AI Edge Gallery app that runs entirely offline.

That makes the E2B and E4B models more than tiny demos. Google is already shipping them as first-class parts of its on-device developer stack.

🧾 More sources

TL;DR2 tweets

Top-line release facts, capabilities, rankings, and availability signals pulled from the core launch evidence.

Benchmarks1 tweets

Launch benchmarks, Arena placement, and independent GPQA plus early operator feedback from discussion.

Hacker News

Discussion around Google releases Gemma 4 open models

1.3k upvotes · 386 comments

Day-one serving stack2 tweets

Immediate ecosystem support across AI Studio, Ollama, SGLang, vLLM, and distribution hubs.