You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I found a bug in your accelerate_sft_trainer.py file that I would like to report. Specifically, I noticed an issue with the prepare_learning() function. In the current implementation, self.total_steps is calculated by multiplying the number of epochs (self.config.train.epochs) with the length of the train_dataloader. However, since train_dataloader is assigned before accelerator preparation, it does not reflect the actual number of training steps taken if multiple GPUs are used.
As a result, the total_steps value ends up being larger than the true total number of training steps, leading to training ending prematurely.
To fix this issue, I recommend modifying the prepare_learning() function to calculate self.total_steps using the self.train_dataloader variable instead, which correctly reflects the number of training steps after accelerator preparation. Here is the suggested modification:
🐛 Describe the bug
I found a bug in your accelerate_sft_trainer.py file that I would like to report. Specifically, I noticed an issue with the prepare_learning() function. In the current implementation, self.total_steps is calculated by multiplying the number of epochs (self.config.train.epochs) with the length of the train_dataloader. However, since train_dataloader is assigned before accelerator preparation, it does not reflect the actual number of training steps taken if multiple GPUs are used.
As a result, the total_steps value ends up being larger than the true total number of training steps, leading to training ending prematurely.
To fix this issue, I recommend modifying the prepare_learning() function to calculate self.total_steps using the self.train_dataloader variable instead, which correctly reflects the number of training steps after accelerator preparation. Here is the suggested modification:
I hope this helps! Let me know if you have any questions or if there's anything else I can assist you with.
Which trlX version are you using?
newest
Additional system and package information
No response
The text was updated successfully, but these errors were encountered: