issue-3490 Sentence transformer trainer support added for training MNRL… #3494

pinareceaktan · 2025-08-20T13:42:25Z

This PR adds a script in examples folder, to adopt Sentence Transformer Trainer for training with MNRL loss. It was formally done with model.fit. New pipeline still accepts curated negatives (hard negatives) which improves training.

Key changes:

Custom collator with curated negatives – Implemented a collator that distributes pre-curated negatives across batches, enabling in-batch negative sampling compatible with MS MARCO-style training.

Dynamic padding – Added dynamic padding in the collator to ensure efficient GPU utilization and avoid unnecessary padding overhead.

Sampler for multiple negatives per query – Integrated a no-duplicates batch sampler to handle cases where a single query has multiple negatives. This ensures proper in-batch negative handling without leaking duplicates.

Gradient accumulation – used gradient accumulation to simulate larger effective batch sizes without exceeding GPU memory. This improves training stability and allows the model to benefit from richer in-batch negative comparisons.

Why this matters:

Brings the training setup closer to standard IR evaluation protocols.
Improves efficiency and control by combining curated negatives with in-batch negatives.
Reduces memory pressure while preserving the benefits of larger batch sizes.

…L objective using hf dataset and customized collator

issue-3490 Sentence transformer trainer upport added for training MNR…

c42db05

…L objective using hf dataset and customized collator

pinareceaktan mentioned this pull request Aug 20, 2025

Support for using MultipleNegativesRankingLoss with SentenceTransformerTrainer (evaluation, early stopping, eval logging, custom callbacks) #3490

Closed

pinareceaktan changed the title ~~issue-3490 Sentence transformer trainer upport added for training MNR…~~ issue-3490 Sentence transformer trainer support added for training MNRL… Aug 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

issue-3490 Sentence transformer trainer support added for training MNRL… #3494

issue-3490 Sentence transformer trainer support added for training MNRL… #3494

pinareceaktan commented Aug 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

issue-3490 Sentence transformer trainer support added for training MNRL… #3494

Are you sure you want to change the base?

issue-3490 Sentence transformer trainer support added for training MNRL… #3494

Conversation

pinareceaktan commented Aug 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant