-
Notifications
You must be signed in to change notification settings - Fork 3
[3/9] final_lr_ratio #17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
eb9e294 to
8a3148c
Compare
d1a5006 to
e231062
Compare
8a3148c to
3b77ec7
Compare
e231062 to
a2546c2
Compare
3b77ec7 to
01e3cfd
Compare
a2546c2 to
bdc2c43
Compare
01e3cfd to
da5dbb1
Compare
0548cfd to
b6d7b83
Compare
|
seems like there is some rebasing trouble? |
fabianlim
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final_lr_ratio should not be a required setting, if scheduler is not linear it should have have any effect
b6d7b83 to
7a0671a
Compare
|
@fabianlim agree with everything, updated |
| num_warmup_steps = int(num_training_steps_for_scheduler * args.warmup_ratio) | ||
| if args.final_lr_ratio is not None and args.lr_scheduler_type == "linear": | ||
| # Correct num_training_steps_for_scheduler to respect final_lr_ratio for a linear scheduler | ||
| num_training_steps_for_scheduler = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
num_training_steps_for_scheduler is not used anwhere else except get_scheduler right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, it's just defined here, and maybe updated if the user specifies final_lr_ratio and is using a linear scheduler.
Alters the SFT linear learning rate scheduler such that the final lr is configurable, rather than having it run down to zero. The final learning rate is
learning_rate * final_lr_ratioNOTE: this changes the default behavior
--final_lr_ratio 0.1, whereas previouslyfinetune.pyeffectively had--final_lr_ratio 0.0. The 0.1 default is inspired by Qwen.NOTE: only implemented for the
linearscheduler. This PR disables other scheduler options.