[constant scheduler] fix: model won't be updated on first training step#1463
Merged
eric-haibin-lin merged 3 commits intoverl-project:mainfrom May 21, 2025
Merged
[constant scheduler] fix: model won't be updated on first training step#1463eric-haibin-lin merged 3 commits intoverl-project:mainfrom
eric-haibin-lin merged 3 commits intoverl-project:mainfrom
Conversation
- Fix LR scheduler step timing to properly record and apply learning rate - Correct warmup scheduler implementation to maintain constant rate after warmup - Increase learning rate in test script for better checkpoint validation
eric-haibin-lin
previously approved these changes
May 15, 2025
Collaborator
Author
|
Hi @eric-haibin-lin, Could you re-review this, just resolve conflicts several days ago |
eric-haibin-lin
approved these changes
May 21, 2025
cedricbeta
pushed a commit
to cedricbeta/verl
that referenced
this pull request
May 21, 2025
yellowbee686
pushed a commit
to yellowbee686/verl
that referenced
this pull request
May 22, 2025
5 tasks
chenjiaoAngel
added a commit
to chenjiaoAngel/verl
that referenced
this pull request
Nov 14, 2025
TimurTaepov
pushed a commit
to giorgossideris/verl
that referenced
this pull request
Dec 20, 2025
vyomakesh0728
added a commit
to vyomakesh0728/verl
that referenced
this pull request
Jan 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
I found that when using the FSDP checkpoint test introduced by #1288, after one step of training, both comparisons pass the tests. This includes comparing the merged FSDP checkpoint with the verl-saved HF model, and comparing the merged FSDP checkpoint with the original HuggingFace model. This means the FSDP model is not being updated after one step of training.
However, the training log shows that in the first step, the learning rate is 1e-6, which is weird. I found two issues in the existing code:
There's a problem with
get_constant_schedule_with_warmup: whennum_warmup_steps=0, at the first step (current_step=0), it will return0.0instead of1.0. This is wrong and inconsistent with the existing constant LR definition: https://github.com/huggingface/transformers/blob/774dc274ac966f4bccbcd90d55bba23f6cca37ae/src/transformers/optimization.py#L72The log saves the learning rate after
actor_lr_scheduler.step(), which is incorrect since it records the next step's LR, thus hiding the problem withget_constant_schedule_with_warmup.This PR fixes these issues and decrease tolerance in FSDP checkpoint test.
Additional Info.
Checklist Before Submitting
[BREAKING]to the PR title if it breaks any API.