issue-3490 Sentence transformer trainer support added for training MNRL… #3494
+465
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds a script in examples folder, to adopt Sentence Transformer Trainer for training with MNRL loss. It was formally done with model.fit. New pipeline still accepts curated negatives (hard negatives) which improves training.
Key changes:
Custom collator with curated negatives – Implemented a collator that distributes pre-curated negatives across batches, enabling in-batch negative sampling compatible with MS MARCO-style training.
Dynamic padding – Added dynamic padding in the collator to ensure efficient GPU utilization and avoid unnecessary padding overhead.
Sampler for multiple negatives per query – Integrated a no-duplicates batch sampler to handle cases where a single query has multiple negatives. This ensures proper in-batch negative handling without leaking duplicates.
Gradient accumulation – used gradient accumulation to simulate larger effective batch sizes without exceeding GPU memory. This improves training stability and allows the model to benefit from richer in-batch negative comparisons.
Why this matters:
Brings the training setup closer to standard IR evaluation protocols.
Improves efficiency and control by combining curated negatives with in-batch negatives.
Reduces memory pressure while preserving the benefits of larger batch sizes.