Skip to content

Conversation

@garrett361
Copy link
Owner

@garrett361 garrett361 commented Jun 27, 2025

Alters the SFT linear learning rate scheduler such that the final lr is configurable, rather than having it run down to zero. The final learning rate is learning_rate * final_lr_ratio

NOTE: this changes the default behavior --final_lr_ratio 0.1, whereas previously finetune.py effectively had --final_lr_ratio 0.0. The 0.1 default is inspired by Qwen.

NOTE: only implemented for the linear scheduler. This PR disables other scheduler options.

PR Title
1 #15 padding-free
2 #16 clean_checkpoints_at_end
3 >17 final_lr_ratio
4 #18 add_seed_and_date_to_run_name
5 #19 additional_model_arguments
6 #20 sync_each_batch=True grad acc
7 #21 no grad acc averaging for sum losses
8 #22 extra reporting
9 #23 local_main_process_first when building dataset

@garrett361 garrett361 changed the title [3/9] WIP: final_lr_ratio [3/9] final_lr_ratio Jun 27, 2025
@garrett361 garrett361 force-pushed the padding-free-squashing-2 branch from eb9e294 to 8a3148c Compare June 27, 2025 19:23
@garrett361 garrett361 force-pushed the padding-free-squashing-3 branch from d1a5006 to e231062 Compare June 27, 2025 19:23
@garrett361 garrett361 force-pushed the padding-free-squashing-2 branch from 8a3148c to 3b77ec7 Compare June 27, 2025 20:21
@garrett361 garrett361 force-pushed the padding-free-squashing-3 branch from e231062 to a2546c2 Compare June 27, 2025 20:21
@garrett361 garrett361 force-pushed the padding-free-squashing-2 branch from 3b77ec7 to 01e3cfd Compare June 27, 2025 20:48
@garrett361 garrett361 force-pushed the padding-free-squashing-3 branch from a2546c2 to bdc2c43 Compare June 27, 2025 20:48
@garrett361 garrett361 force-pushed the padding-free-squashing-2 branch from 01e3cfd to da5dbb1 Compare June 27, 2025 20:54
@garrett361 garrett361 force-pushed the padding-free-squashing-3 branch 3 times, most recently from 0548cfd to b6d7b83 Compare June 27, 2025 21:19
@fabianlim
Copy link
Collaborator

seems like there is some rebasing trouble?

@fabianlim fabianlim changed the base branch from padding-free-squashing-2 to padding-free-squashing June 28, 2025 00:26
@fabianlim fabianlim changed the base branch from padding-free-squashing to main June 28, 2025 00:28
Copy link
Collaborator

@fabianlim fabianlim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final_lr_ratio should not be a required setting, if scheduler is not linear it should have have any effect

@garrett361 garrett361 force-pushed the padding-free-squashing-3 branch from b6d7b83 to 7a0671a Compare June 28, 2025 01:39
@garrett361
Copy link
Owner Author

@fabianlim agree with everything, updated

num_warmup_steps = int(num_training_steps_for_scheduler * args.warmup_ratio)
if args.final_lr_ratio is not None and args.lr_scheduler_type == "linear":
# Correct num_training_steps_for_scheduler to respect final_lr_ratio for a linear scheduler
num_training_steps_for_scheduler = (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

num_training_steps_for_scheduler is not used anwhere else except get_scheduler right?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, it's just defined here, and maybe updated if the user specifies final_lr_ratio and is using a linear scheduler.

@fabianlim fabianlim mentioned this pull request Jun 28, 2025
@fabianlim fabianlim merged commit 8f21b76 into main Jun 30, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants