Skip to content
AI Primer
release

Sentence Transformers releases v5.3.0 with new InfoNCE variants and hardness weighting

Sentence Transformers v5.3.0 adds configurable contrastive loss directions, hardness weighting, new regularization losses, and Transformers v5 compatibility. Try it to test richer retrieval training losses without rewriting your stack.

2 min read
Sentence Transformers releases v5.3.0 with new InfoNCE variants and hardness weighting
Sentence Transformers releases v5.3.0 with new InfoNCE variants and hardness weighting

TL;DR

  • Sentence Transformers v5.3.0 expands MultipleNegativesRankingLoss with configurable InfoNCE directions and partitioning, so the same training API can now express standard, symmetric, and GTE-style contrastive setups, according to the release thread.
  • The release also adds optional hardness weighting for in-batch negatives, which the maintainer's notes describe as a stronger training signal for harder examples and say also works with CachedMNRL.
  • Two new losses landed: GlobalOrthogonalRegularizationLoss for reducing unrelated embedding similarity and CachedSpladeLoss for memory-efficient SPLADE training, as detailed in the v5.3 notes.
  • On the plumbing side, v5.3.0 brings full Transformers v5 compatibility, a faster hashed no-duplicates sampler, and swaps requests for optional httpx, per the changelog thread.

What changed in contrastive training?

The biggest API change is in MultipleNegativesRankingLoss. The maintainer says v5.3.0 adds new directions and partition_mode parameters, letting you choose interactions like query_to_doc, doc_to_query, query_to_query, and doc_to_doc instead of being locked to one InfoNCE formulation thread details. The example shown in [img:0|Loss config] uses a joint partition with all four directions enabled, which makes the update more than a paper-level tweak: it is a direct training config change.

The same loss now supports hardness weighting through hardness_mode and hardness_strength. In the thread, the author says it “up-weights harder negatives in the softmax” and that the feature also works with CachedMNRL hardness weighting. For embedding teams already training retrievers on in-batch negatives, that means richer contrastive objectives without rewriting trainers or data pipelines.

What else shipped for embedding and retrieval stacks?

v5.3.0 also adds two new objectives aimed at retrieval workloads. GlobalOrthogonalRegularizationLoss penalizes high similarity among unrelated embeddings, and the maintainer says it can be combined with InfoNCE while sharing embeddings in a single forward pass new losses. CachedSpladeLoss is described as a gradient-cached SPLADE loss that enables larger batch sizes “without extra GPU memory,” which is the most deployment-relevant part of the release for sparse retrieval training SPLADE update.

The rest of the release is operational cleanup: a faster NoDuplicatesBatchSampler using hashing, a GroupByLabelBatchSampler fix for triplet losses, full compatibility with recent Transformers v5, and requests replaced by optional httpx full release notes. In a follow-up reply, the maintainer also pointed users to a ready-made sentence-transformers dataset catalog, which gives teams a quicker path to exercising the new losses on tagged Hugging Face datasets dataset reply.

Further reading

Discussion across the web

Where this story is being discussed, in original context.

On X· 1 thread
What else shipped for embedding and retrieval stacks?1 post
Share on X