Skip to content

Commit a77fb79

Browse files
Кирилл УстиновКирилл Устинов
Кирилл Устинов
authored and
Кирилл Устинов
committed
fixed sft trainer docs
1 parent aed5da5 commit a77fb79

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

Diff for: docs/source/sft_trainer.mdx

+1-1
Original file line numberDiff line numberDiff line change
@@ -624,7 +624,7 @@ To learn more about Liger-Kernel, visit their [official repository](https://gith
624624

625625
Pay attention to the following best practices when training a model with that trainer:
626626

627-
- [`SFTTrainer`] always pads by default the sequences to the `max_seq_length` argument of the [`SFTTrainer`]. If none is passed, the trainer will retrieve that value from the tokenizer. Some tokenizers do not provide a default value, so there is a check to retrieve the minimum between 2048 and that value. Make sure to check it before training.
627+
- [`SFTTrainer`] always truncates by default the sequences to the `max_seq_length` argument of the [`SFTTrainer`]. If none is passed, the trainer will retrieve that value from the tokenizer. Some tokenizers do not provide a default value, so there is a check to retrieve the minimum between 1024 and that value. Make sure to check it before training.
628628
- For training adapters in 8bit, you might need to tweak the arguments of the `prepare_model_for_kbit_training` method from PEFT, hence we advise users to use `prepare_in_int8_kwargs` field, or create the `PeftModel` outside the [`SFTTrainer`] and pass it.
629629
- For a more memory-efficient training using adapters, you can load the base model in 8bit, for that simply add `load_in_8bit` argument when creating the [`SFTTrainer`], or create a base model in 8bit outside the trainer and pass it.
630630
- If you create a model outside the trainer, make sure to not pass to the trainer any additional keyword arguments that are relative to `from_pretrained()` method.

0 commit comments

Comments
 (0)