IBM releases Granite Embedding R2 with 32,768-token context and +11.8 MMTEB retrieval gain
IBM released 97M and 311M multilingual Granite Embedding R2 models under Apache 2.0, replacing XLM-RoBERTa with ModernBERT and extending context length from 512 to 32,768 tokens. The 311M model posts a +11.8 gain on MMTEB retrieval and ships with ONNX, OpenVINO, vLLM, and GGUF support.

TL;DR
- IBM shipped two Apache 2.0 multilingual embedding models, the 97M
granite-embedding-97m-multilingual-r2and 311Mgranite-embedding-311m-multilingual-r2, according to tomaarsen's release thread and the 311M model card. - The architectural jump is a swap from XLM-RoBERTa to ModernBERT, which tomaarsen's architecture note says expands context from 512 to 32,768 tokens, while IBM's 97M model card confirms the same 32,768-token window on the smaller model.
- On MMTEB Retrieval, the 311M model reached 64.0, a +11.8 point gain over R1, and tomaarsen's benchmark post also reports a +14.2 point average gain across retrieval, code, long-document, and reasoning benchmarks.
- Coverage is broad: tomaarsen's feature list says the models support 200+ languages, give 52 languages extra retrieval-pair training, and add code retrieval across nine programming languages.
- IBM also shipped the usual production hooks on day one, with tomaarsen's backend list calling out ONNX, OpenVINO, vLLM, and GGUF, while tomaarsen's Sentence Transformers example shows direct use through Sentence Transformers and downstream frameworks.
The interesting bit here is how much of the story sits in the model cards, not the headline. You can read the 311M card, compare it with the 97M card, and trace the encoder swap back to Hugging Face's ModernBERT writeup. The deployment angle is similarly practical: Sentence Transformers already documents ONNX and OpenVINO backends, which makes IBM's drop-in example more useful than most release-thread code snippets.
Models
R2 comes in two sizes:
granite-embedding-97m-multilingual-r2, 384 dimensions, per tomaarsen's release thread and the 97M model cardgranite-embedding-311m-multilingual-r2, 768 dimensions, per tomaarsen's release thread and the 311M model card
Both are Apache 2.0. IBM says the training mix uses permissive, enterprise-friendly data plus IBM-collected and IBM-generated datasets in the model cards.
ModernBERT
The main upgrade is the encoder swap. According to tomaarsen's architecture note, R2 replaces XLM-RoBERTa with ModernBERT and stretches the window from 512 to 32,768 tokens.
That puts these models into a different slice of embedding work than the old Granite release. Hugging Face's ModernBERT post introduced the architecture as a faster, longer-context encoder family, and tomaarsen's follow-up post says IBM did more than a simple finetune by expanding the multilingual vocabulary on top of that base.
Benchmarks
The 311M model is where IBM's gains look sharpest:
- MMTEB Retrieval: 64.0, +11.8 points over R1, per tomaarsen's benchmark post
- Average across retrieval, code, long-document, and reasoning benchmarks: +14.2 points, per tomaarsen's benchmark post
- The 97M model card reports 59.6 on MMTEB Retrieval, via the 97M model card
The rest of the benchmark-adjacent feature set is unusually deployment-minded:
- 200+ supported languages
- 52 languages with enhanced retrieval-pair training
- Code retrieval for Python, Go, Java, JavaScript, PHP, Ruby, SQL, C, and C++
- Matryoshka truncation to 128 dimensions with roughly a 2 point loss
- A 262K tokenizer borrowed from Gemma 3
Deployment
IBM shipped the models in formats people can actually plug in. tomaarsen's feature list names ONNX, OpenVINO, vLLM, and GGUF support out of the box.
The integration path is also short. In tomaarsen's Sentence Transformers example, the 311M model loads directly through SentenceTransformer(..., backend="onnx"), and the Sentence Transformers backend docs show the same ONNX and OpenVINO pathways. Tomaarsen adds that this makes the release a drop-in for LangChain, LlamaIndex, Haystack, and PydanticAI tomaarsen's Sentence Transformers example.