Regarding the behavior of max_seq_length in SFTTrainer #2400

Taiki-azrs · 2024-11-27T00:39:06Z

The SFTTrainer documentation states:

SFTTrainer always pads by default the sequences to the max_seq_length argument of the SFTTrainer.
https://huggingface.co/docs/trl/main/en/sft_trainer#best-practices

However, when looking at the actual code, the Tokenizer appears to have padding=False, and it does not seem to pad sequences to the max_seq_length value.
https://github.com/huggingface/trl/blob/main/trl/trainer/sft_trainer.py#L420

How does SFTTrainer ensure that sequences are padded to the max_seq_length value?

qgallouedec · 2024-12-13T22:51:33Z

You're right, the documentation is wrong. Would you like to contribute by correcting it?

umbilnm · 2024-12-25T10:20:41Z

Hi! I can correct it. Based on the discussion, it seems we could take one of two approaches:

Completely remove this mention from the “Best Practices” section
Update the text to clarify that truncation (rather than padding) happens by default.
Could you let me know which approach is better?

qgallouedec · 2024-12-25T10:49:41Z

Probably the 2. What do you think?

umbilnm · 2024-12-25T11:08:13Z

Ok, also if max_seq_len isn't specified the trainer sets it to min(1024, tokenizer.model_max_lenght) (not 2048), so revised text may look like this:

SFTTrainer truncates sequences by default to the max_seq_length specified. If max_seq_length is not provided, the trainer sets it to the minimum of tokenizer.model_max_length and 1024. Ensure you verify this setting before training to avoid unintended behavior.

qgallouedec added 📚 documentation Improvements or additions to documentation 👶 good first issue Good for newcomers 🏋 SFT Related to SFT labels Dec 13, 2024

umbilnm mentioned this issue Dec 27, 2024

🔠 Fix SFT truncation documentation #2521

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regarding the behavior of max_seq_length in SFTTrainer #2400

Regarding the behavior of max_seq_length in SFTTrainer #2400

Taiki-azrs commented Nov 27, 2024

qgallouedec commented Dec 13, 2024

umbilnm commented Dec 25, 2024

qgallouedec commented Dec 25, 2024

umbilnm commented Dec 25, 2024 •

edited

Loading

Regarding the behavior of max_seq_length in SFTTrainer #2400

Regarding the behavior of max_seq_length in SFTTrainer #2400

Comments

Taiki-azrs commented Nov 27, 2024

qgallouedec commented Dec 13, 2024

umbilnm commented Dec 25, 2024

qgallouedec commented Dec 25, 2024

umbilnm commented Dec 25, 2024 • edited Loading

umbilnm commented Dec 25, 2024 •

edited

Loading