[bug][algorithm] remove incorrect torch.no_grad() for kl in loss (use_kl_loss=True) by erictang000 · Pull Request #1353 · NovaSky-AI/SkyRL

erictang000 · 2026-03-19T22:53:14Z

gemini-code-assist

Code Review

This pull request correctly addresses a bug where gradients were not flowing through the KL divergence term in the loss function when use_kl_loss=True. The fix involves removing the @torch.no_grad() decorator from the compute_approx_kl function. To maintain correctness in other parts of the code that rely on a gradient-free KL computation, specifically for KL-based reward penalties, a with torch.no_grad() context is added at the call site in apply_reward_kl_penalty. The changes are accurate and well-contained.

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 2 additional findings.

erictang000 · 2026-03-20T00:17:11Z

GSM8K with and without fix (looks pretty similar, but probably just b/c it's gsm8k):

SumanthRH · 2026-03-20T00:25:09Z

@erictang000 should be able to see the diff with a large kl penalty. Could also just print the kl loss tensor in the worker before and after the fix (after should have requires grad)

erictang000 · 2026-03-20T06:23:11Z

here after setting kl_loss_coef=1.0

…_kl_loss=True) (#1353) Fixes #1340  --- <a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1353" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a>

x

cf97979

gemini-code-assist Bot reviewed Mar 19, 2026

View reviewed changes

devin-ai-integration Bot reviewed Mar 19, 2026

View reviewed changes

SumanthRH approved these changes Mar 20, 2026

View reviewed changes

erictang000 merged commit 9d7bca9 into NovaSky-AI:main Mar 20, 2026
5 of 6 checks passed

erictang000 mentioned this pull request Mar 20, 2026

Gradient disabled for KL loss computation #1340

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug][algorithm] remove incorrect torch.no_grad() for kl in loss (use_kl_loss=True)#1353

[bug][algorithm] remove incorrect torch.no_grad() for kl in loss (use_kl_loss=True)#1353
erictang000 merged 1 commit intoNovaSky-AI:mainfrom
erictang000:fix_kl_in_loss

erictang000 commented Mar 19, 2026 •

edited by devin-ai-integration Bot

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

erictang000 commented Mar 20, 2026

Uh oh!

SumanthRH commented Mar 20, 2026 •

edited

Loading

Uh oh!

erictang000 commented Mar 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

erictang000 commented Mar 19, 2026 • edited by devin-ai-integration Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

erictang000 commented Mar 20, 2026

Uh oh!

SumanthRH commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

erictang000 commented Mar 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

erictang000 commented Mar 19, 2026 •

edited by devin-ai-integration Bot

Loading

SumanthRH commented Mar 20, 2026 •

edited

Loading