Skip to content

Conversation

@pinareceaktan
Copy link

This PR adds a script in examples folder, to adopt Sentence Transformer Trainer for training with MNRL loss. It was formally done with model.fit. New pipeline still accepts curated negatives (hard negatives) which improves training.

Key changes:

Custom collator with curated negatives – Implemented a collator that distributes pre-curated negatives across batches, enabling in-batch negative sampling compatible with MS MARCO-style training.

Dynamic padding – Added dynamic padding in the collator to ensure efficient GPU utilization and avoid unnecessary padding overhead.

Sampler for multiple negatives per query – Integrated a no-duplicates batch sampler to handle cases where a single query has multiple negatives. This ensures proper in-batch negative handling without leaking duplicates.

Gradient accumulation – used gradient accumulation to simulate larger effective batch sizes without exceeding GPU memory. This improves training stability and allows the model to benefit from richer in-batch negative comparisons.

Why this matters:

Brings the training setup closer to standard IR evaluation protocols.
Improves efficiency and control by combining curated negatives with in-batch negatives.
Reduces memory pressure while preserving the benefits of larger batch sizes.

…L objective using hf dataset and customized collator
@pinareceaktan pinareceaktan changed the title issue-3490 Sentence transformer trainer upport added for training MNR… issue-3490 Sentence transformer trainer support added for training MNRL… Aug 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant