Skip to content
AI Primer
release

Sentence Transformers 5.5.0 adds train-sentence-transformers skill with one-shot 0.8856 NDCG@10

Sentence Transformers 5.5.0 adds an agent skill for fine-tuning embeddings, rerankers, and sparse encoders from Claude Code, Codex, Cursor, and Gemini CLI. The author reports a one-shot German embedding run rising from 0.6720 to 0.8856 NDCG@10 on a local PC.

4 min read
Sentence Transformers 5.5.0 adds train-sentence-transformers skill with one-shot 0.8856 NDCG@10
Sentence Transformers 5.5.0 adds train-sentence-transformers skill with one-shot 0.8856 NDCG@10

TL;DR

You can jump straight to the release notes, inspect the one-shot German model, and browse tomaarsen's training checklist for the parts the skill now covers, from hard-negative mining to Matryoshka training. tomaarsen's model-upload note also claims the agent completed the training and Hub upload without manual README or upload steps.

train-sentence-transformers

The headline feature is a Hugging Face skill package that turns agentic coding shells into Sentence Transformers training wrappers. The install surface is tiny, hf skills add train-sentence-transformers, and the prompt surface is equally blunt: describe the model you want, then let the agent assemble the run.

tomaarsen's training checklist says the skill ships guidance across all three supported model families. The covered pieces are:

  • base model selection
  • loss and evaluator choice
  • hard-negative mining
  • distillation
  • LoRA
  • Matryoshka training
  • multilingual setups
  • static embeddings
  • template training scripts the agent can adapt

That is the interesting bit. This is not just a single canned finetune command, it is a packaged set of training heuristics and scripts aimed at the fiddly parts people usually look up from old notebooks or example repos.

German retrieval run

The only concrete result in the evidence pool is a German retrieval run that tomaarsen's benchmark post says Claude Code completed on the author's own PC in about 30 minutes. The reported metric moved from 0.6720 to 0.8856 NDCG@10.

The follow-up matters almost as much as the score. In tomaarsen's model-upload note, tomaarsen says the resulting model was uploaded in a fully one-shot flow, without manually editing the model README or handling the upload step, and that the run can still be interrupted for edits, extra experiments, or SLURM handoff.

New losses

The release also adds two new objective functions:

  • EmbedDistillLoss for SentenceTransformer: tomaarsen's EmbedDistillLoss post says it matches student embeddings directly against precomputed teacher embeddings, instead of distilling teacher scores as MarginMSELoss does. It also supports an optional learnable projection when teacher and student embedding dimensions differ.
  • ADRMSELoss for CrossEncoder: tomaarsen's ADRMSELoss post describes it as a listwise learning-to-rank loss from the Rank-DistiLLM paper, aimed at reranker training from an LLM's document ordering.

For people training retrieval stacks rather than just serving them, that makes 5.5.0 more than an agent-skill release.

processing_kwargs

A smaller API change in tomaarsen's processing_kwargs note gives encode() and predict() a per-call processing_kwargs override. That means max length, image resolution, or video FPS can change for one invocation without rebuilding the model object.

The fixes list in tomaarsen's fixes roundup and tomaarsen's DeepSpeed and loading fixes is mundane in the good way:

  • CLS pooling now picks the first real token with left-padding tokenizers, rather than silently taking a PAD token.
  • AdaptiveLayerLoss and CrossEncoder losses now train under DistributedDataParallel and torch.compile.
  • model.config now delegates to the underlying Transformers config, which tomaarsen says improves DeepSpeed ZeRO behavior.
  • Loading from local paths and private trust_remote_code repos is more robust.

Those are the kinds of changes that disappear in launch summaries and show up later as fewer weird bug reports.

Further reading

Discussion across the web

Where this story is being discussed, in original context.

On X· 3 threads
TL;DR1 post
train-sentence-transformers1 post
processing_kwargs1 post
Share on X