[bug][algorithm] remove incorrect torch.no_grad() for kl in loss (use_kl_loss=True)#1353
Merged
erictang000 merged 1 commit intoNovaSky-AI:mainfrom Mar 20, 2026
Merged
Conversation
Contributor
There was a problem hiding this comment.
Code Review
This pull request correctly addresses a bug where gradients were not flowing through the KL divergence term in the loss function when use_kl_loss=True. The fix involves removing the @torch.no_grad() decorator from the compute_approx_kl function. To maintain correctness in other parts of the code that rely on a gradient-free KL computation, specifically for KL-based reward penalties, a with torch.no_grad() context is added at the call site in apply_reward_kl_penalty. The changes are accurate and well-contained.
Collaborator
Author
Member
|
@erictang000 should be able to see the diff with a large kl penalty. Could also just print the kl loss tensor in the worker before and after the fix (after should have requires grad) |
Collaborator
Author
SumanthRH
approved these changes
Mar 20, 2026
devpatelio
pushed a commit
that referenced
this pull request
Mar 20, 2026
…_kl_loss=True) (#1353) Fixes #1340 <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1353" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a> <!-- devin-review-badge-end -->
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


Fixes #1340