Skip to content
AI Primer
release

IBM releases Granite Embedding R2 with 32,768-token context and +11.8 MMTEB retrieval gain

IBM released 97M and 311M multilingual Granite Embedding R2 models under Apache 2.0, replacing XLM-RoBERTa with ModernBERT and extending context length from 512 to 32,768 tokens. The 311M model posts a +11.8 gain on MMTEB retrieval and ships with ONNX, OpenVINO, vLLM, and GGUF support.

3 min read
IBM releases Granite Embedding R2 with 32,768-token context and +11.8 MMTEB retrieval gain
IBM releases Granite Embedding R2 with 32,768-token context and +11.8 MMTEB retrieval gain

TL;DR

  • IBM shipped two Apache 2.0 multilingual embedding models, the 97M granite-embedding-97m-multilingual-r2 and 311M granite-embedding-311m-multilingual-r2, according to tomaarsen's release thread and the 311M model card.
  • The architectural jump is a swap from XLM-RoBERTa to ModernBERT, which tomaarsen's architecture note says expands context from 512 to 32,768 tokens, while IBM's 97M model card confirms the same 32,768-token window on the smaller model.
  • On MMTEB Retrieval, the 311M model reached 64.0, a +11.8 point gain over R1, and tomaarsen's benchmark post also reports a +14.2 point average gain across retrieval, code, long-document, and reasoning benchmarks.
  • Coverage is broad: tomaarsen's feature list says the models support 200+ languages, give 52 languages extra retrieval-pair training, and add code retrieval across nine programming languages.
  • IBM also shipped the usual production hooks on day one, with tomaarsen's backend list calling out ONNX, OpenVINO, vLLM, and GGUF, while tomaarsen's Sentence Transformers example shows direct use through Sentence Transformers and downstream frameworks.

The interesting bit here is how much of the story sits in the model cards, not the headline. You can read the 311M card, compare it with the 97M card, and trace the encoder swap back to Hugging Face's ModernBERT writeup. The deployment angle is similarly practical: Sentence Transformers already documents ONNX and OpenVINO backends, which makes IBM's drop-in example more useful than most release-thread code snippets.

Models

R2 comes in two sizes:

Both are Apache 2.0. IBM says the training mix uses permissive, enterprise-friendly data plus IBM-collected and IBM-generated datasets in the model cards.

ModernBERT

The main upgrade is the encoder swap. According to tomaarsen's architecture note, R2 replaces XLM-RoBERTa with ModernBERT and stretches the window from 512 to 32,768 tokens.

That puts these models into a different slice of embedding work than the old Granite release. Hugging Face's ModernBERT post introduced the architecture as a faster, longer-context encoder family, and tomaarsen's follow-up post says IBM did more than a simple finetune by expanding the multilingual vocabulary on top of that base.

Benchmarks

The 311M model is where IBM's gains look sharpest:

The rest of the benchmark-adjacent feature set is unusually deployment-minded:

  • 200+ supported languages
  • 52 languages with enhanced retrieval-pair training
  • Code retrieval for Python, Go, Java, JavaScript, PHP, Ruby, SQL, C, and C++
  • Matryoshka truncation to 128 dimensions with roughly a 2 point loss
  • A 262K tokenizer borrowed from Gemma 3

Deployment

IBM shipped the models in formats people can actually plug in. tomaarsen's feature list names ONNX, OpenVINO, vLLM, and GGUF support out of the box.

The integration path is also short. In tomaarsen's Sentence Transformers example, the 311M model loads directly through SentenceTransformer(..., backend="onnx"), and the Sentence Transformers backend docs show the same ONNX and OpenVINO pathways. Tomaarsen adds that this makes the release a drop-in for LangChain, LlamaIndex, Haystack, and PydanticAI tomaarsen's Sentence Transformers example.

Further reading

Discussion across the web

Where this story is being discussed, in original context.

On X· 1 thread
ModernBERT1 post
Share on X