Skip to content
AI Primer
release

IBM releases Granite 4.1 30B/8B/3B open models under Apache 2.0

IBM released Granite 4.1 as three open instruct models, with third parties quickly surfacing token-efficiency and deployment access. The update matters for teams evaluating smaller open models for agent workloads where output-token burn and openness both affect production cost.

5 min read
IBM releases Granite 4.1 30B/8B/3B open models under Apache 2.0
IBM releases Granite 4.1 30B/8B/3B open models under Apache 2.0

TL;DR

You can read IBM's official launch post, skim the Granite 4.1 docs, and jump straight to the Hugging Face 8B model card. The odd little tell is that third parties spent more time on token burn than raw score, while IBM used the same drop to ship speech, vision, embeddings, and Guardian updates in one bundle.

Dense language models

IBM's docs describe Granite 4.1 as a dense family with 3B, 8B, and 30B sizes, each offered in base and instruct variants, plus optional FP8 quantization for cheaper deployment Granite 4.1 docs. The 8B model card adds that IBM reworked post-training with supervised finetuning and reinforcement learning alignment, and lists 12 supported languages including English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese Hugging Face 8B model card.

The official positioning is narrow and pretty clear:

  • 3B: edge and resource-constrained deployments, per the docs
  • 8B: general-purpose enterprise applications, per the docs
  • 30B: higher-capacity model for more complex tasks, per the docs

Artificial Analysis scored the three instruct models at 15, 12, and 9 on its Intelligence Index for 30B, 8B, and 3B respectively, according to ArtificialAnlys' launch thread. That leaves Granite 4.1 below peers like Qwen3.5 and Gemma on raw composite score, but ahead of its own 4.0 predecessors in the 30B and 3B slots, with ArtificialAnlys' evaluation breakdown highlighting the biggest 30B gains in tool use and agentic tasks.

Output-token profile

The most distinctive Granite 4.1 claim is not benchmark leadership. It is output efficiency.

According to ArtificialAnlys' launch thread, Granite 4.1 8B used roughly 4 million output tokens for the Intelligence Index, compared with 13 million for Ministral 3 8B, 8 million for Gemma 4 E4B, and 78 million for Qwen3.5 9B. The same pattern showed up across the family, with the 30B model at 4.6 million tokens and the 3B model at 2.7 million.

That framing matters because Artificial Analysis is effectively describing score per token budget, not just score. IBM's own post leans into the same lane, calling out improvements in tool calling and instruction following for real-world systems rather than a chase for top-end reasoning scores official launch post.

The per-benchmark chart makes the shape of the release easier to read:

Openness and access

On openness, Artificial Analysis gave all three Granite 4.1 models a 61 on its Openness Index, ahead of Qwen3.5, Gemma 4, and Mistral Small 4, according to ArtificialAnlys' openness follow-up. The cited reason was not just Apache 2.0 weights, but extra disclosure around pre-training data, post-training data, and methodology.

The practical rollout was immediate:

  • W&B Inference listed Granite 4.1 8B on day one at $0.05 per 1M input tokens and $0.10 per 1M output tokens, with 131K context, according to wandb's launch post
  • Replicate linked live pages for both Granite 4.1 language and Granite Speech 4.1 on launch day, according to replicate's links post
  • Replicate's model page says the 8B model is aimed at summarization, extraction, question answering, RAG, code tasks, function calling, and multilingual dialogue Replicate model page

That makes Granite 4.1 a familiar IBM move: permissive weights, more documentation than most peers, and fast placement on hosted inference surfaces instead of a Hugging Face-only drop.

Speech and vision

Granite 4.1 was not only a text-model refresh. IBM's launch post says the same release bundled speech, vision, embeddings, and Guardian updates, with Granite speech pitched on transcription accuracy and Granite vision on chart and table extraction official launch post.

Replicate's speech listing adds a concrete detail the main language-model chatter skipped: Granite Speech 4.1 2B supports multilingual ASR and bidirectional speech translation across English, French, German, Spanish, Portuguese, Japanese, with Apache 2.0 licensing and variants for speaker attribution and higher-throughput non-autoregressive decoding Granite Speech 4.1 2B page. Separately, vLLM's v0.20.0 thread and the matching vLLM release notes show Granite 4.1 Vision already landed as built-in support in vLLM v0.20.0, so the ecosystem pickup extended beyond the 8B text model that got most of the launch-day attention.

Further reading

Discussion across the web

Where this story is being discussed, in original context.

On X· 1 thread
Speech and vision1 post
Share on X