Local Inference Coding Agents Llm Serving Multimodal Open Source Gemma

Gemma

Open models for developers

Visit site

Google DeepMind's open model family for developers, including multimodal releases.

Pricing

Model profile · Current snapshot

Input / 1M

$0.11

Output / 1M

$0.25

Blended / 1M

$0.145

Output TPS

TTFT (s)

Model Intelligence

Arena ranking

Benchmarkable

Model level

family

Intelligence Index

4.8

Math Index

20.7

MMLU Pro

0.67

GPQA

0.43

HLE

0.05

LiveCodeBench

0.14

SciCode

0.21

MATH-500

0.88

AIME

0.25

AIME 2025

0.21

IFBench

0.32

LCR

0.06

TerminalBench Hard

0.04

TAU2

0.11

Recent stories

2 linked stories

releasePRIMARY2026-06-03

Gemma 4 12B releases with 256K context and unified audio-vision input

Google’s new Gemma 4 12B ships as an encoder-free open model for text, image, audio, and video tasks with a 256K context window. Early GGUF ports and local benchmarks make it a plausible on-device multimodal option for creator tooling and experimentation.

releasePRIMARY2026-04-02

Google DeepMind releases Gemma 4 under Apache 2.0 with 31B Dense, 26B MoE, and 256K context

Google DeepMind shipped four Gemma 4 models with multimodal input, including 31B Dense, 26B MoE, and two edge variants available through AI Studio, Hugging Face, Kaggle, and Ollama. Early community tests say local performance and usable context windows still vary by runtime, quantization, and GPU memory.