[misc] fix: fix megatron entropy by vermouth1992 · Pull Request #1672 · verl-project/verl

vermouth1992 · 2025-05-24T12:27:32Z

Checklist Before Starting

Search for similar PR(s).

What does this PR do?

In megatron-core, vocab_parallel_log_probs_from_logits is an inplace operator that would modify the logits in place to save memory. This makes the vocab_parallel_entropy produces incorrect results if vocab_parallel_entropy is computed after vocab_parallel_log_probs_from_logits. We swap the order to make sure the result is correct.

High-Level Design

Demonstrate the high-level design if this PR is complex.

Specific Changes

List the specific changes.

API

Demonstrate how the API changes if any.

Usage Example

Provide usage example(s) for easier usage.

# Add code snippet or script demonstrating how to use this

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc.

Additional Info.

Issue Number: Fixes issue # or discussion # if any.
Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none]
Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none]

Checklist Before Submitting

Read the Contribute Guide.
Apply pre-commit checks.
Add [BREAKING] to the PR title if it breaks any API.
Update the documentation about your changes in the docs.
Add CI test(s) if necessary.

### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? In megatron-core, `vocab_parallel_log_probs_from_logits` is an inplace operator that would modify the logits in place to save memory. This makes the `vocab_parallel_entropy` produces incorrect results if `vocab_parallel_entropy` is computed after `vocab_parallel_log_probs_from_logits`. We swap the order to make sure the result is correct. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary.

fix megatron logits

461bfb5

vermouth1992 requested a review from ETOgaosion May 24, 2025 12:27

eric-haibin-lin approved these changes May 24, 2025

View reviewed changes

vermouth1992 merged commit 4532308 into main May 24, 2025
23 checks passed

vermouth1992 deleted the chi/fix/megatron_logits branch May 24, 2025 16:04

linxxx3 mentioned this pull request Jun 11, 2025

pr #1629 breaks autograd when entropy_coeff != 0 #1970

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[misc] fix: fix megatron entropy#1672

[misc] fix: fix megatron entropy#1672
vermouth1992 merged 1 commit intomainfrom
chi/fix/megatron_logits

vermouth1992 commented May 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vermouth1992 commented May 24, 2025

Checklist Before Starting

What does this PR do?

High-Level Design

Specific Changes

API

Usage Example

Test

Additional Info.

Checklist Before Submitting

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants