Skip to content

[bug][algorithm] remove incorrect torch.no_grad() for kl in loss (use_kl_loss=True)#1353

Merged
erictang000 merged 1 commit intoNovaSky-AI:mainfrom
erictang000:fix_kl_in_loss
Mar 20, 2026
Merged

[bug][algorithm] remove incorrect torch.no_grad() for kl in loss (use_kl_loss=True)#1353
erictang000 merged 1 commit intoNovaSky-AI:mainfrom
erictang000:fix_kl_in_loss

Conversation

@erictang000
Copy link
Copy Markdown
Collaborator

@erictang000 erictang000 commented Mar 19, 2026

Fixes #1340


Open with Devin

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly addresses a bug where gradients were not flowing through the KL divergence term in the loss function when use_kl_loss=True. The fix involves removing the @torch.no_grad() decorator from the compute_approx_kl function. To maintain correctness in other parts of the code that rely on a gradient-free KL computation, specifically for KL-based reward penalties, a with torch.no_grad() context is added at the call site in apply_reward_kl_penalty. The changes are accurate and well-contained.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 2 additional findings.

Open in Devin Review

@erictang000
Copy link
Copy Markdown
Collaborator Author

GSM8K with and without fix (looks pretty similar, but probably just b/c it's gsm8k):
image

@SumanthRH
Copy link
Copy Markdown
Member

SumanthRH commented Mar 20, 2026

@erictang000 should be able to see the diff with a large kl penalty. Could also just print the kl loss tensor in the worker before and after the fix (after should have requires grad)

@erictang000
Copy link
Copy Markdown
Collaborator Author

here after setting kl_loss_coef=1.0

image

@erictang000 erictang000 merged commit 9d7bca9 into NovaSky-AI:main Mar 20, 2026
5 of 6 checks passed
devpatelio pushed a commit that referenced this pull request Mar 20, 2026
…_kl_loss=True) (#1353)

Fixes #1340
<!-- devin-review-badge-begin -->

---

<a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1353"
target="_blank">
  <picture>
<source media="(prefers-color-scheme: dark)"
srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
<img
src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1"
alt="Open with Devin">
  </picture>
</a>
<!-- devin-review-badge-end -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Gradient disabled for KL loss computation

2 participants