[algo] fix: correctly aggregate kl metrics in PPO actor by 0x404 · Pull Request #2259 · verl-project/verl

0x404 · 2025-06-29T03:18:18Z

What does this PR do?

This PR fix an issue in dp_actor where actor/kl_loss and actor/kl_coef were being continuously overwritten during the micro-batch processing loop.

Previously, the long-lived metrics dictionary was updated directly, causing the value for these metrics to reflect only the final micro-batch of any given step, rather than an aggregation of all micro-batches within that step.

This change refactors the logic to align the collection of all metrics, now kl_loss is collected for each micro-batch, the same as other metrics like pg_loss.

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
Add / Update the documentation.
Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: ...
Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace.

verl/workers/actor/dp_actor.py

…llelPPOCritic

0x404 · 2025-06-29T08:20:50Z

verl/workers/fsdp_workers.py

-            self.critic_lr_scheduler.step()
            lr = self.critic_lr_scheduler.get_last_lr()[0]
            metrics["critic/lr"] = lr
+            self.critic_lr_scheduler.step()


also fix an issue where previous recorded lr is actually the next step's lr instead of the current step. as #1463

vermouth1992 · 2025-06-29T08:22:27Z

One more thing, could you also fix the metrics aggregation in this PR like this #2144? In #2144, DataProto was modified to support this. As a general data structure, this is inappropriate. The reasonable approach is to perform allgather inside the worker, and aggregate on rank zero. Just like what typically torchrun programs did. Do you mind also fix this as well? Thanks!

0x404 · 2025-06-29T08:36:40Z

One more thing, could you also fix the metrics aggregation in this PR like this #2144? In #2144, DataProto was modified to support this. As a general data structure, this is inappropriate. The reasonable approach is to perform allgather inside the worker, and aggregate on rank zero. Just like what typically torchrun programs did. Do you mind also fix this as well? Thanks!

Sure, I would like to fix this. Should I open a separate PR or push that fix into this?

vermouth1992 · 2025-06-29T08:38:12Z

Just push the fix into this PR is fine as it just fixes aggregation problem

0x404 · 2025-06-29T11:02:42Z

verl/workers/fsdp_workers.py

+            # Note: we should convert worker's metrics to DataProto's non_tensor_batch
+            # so that metrics from different workers can be correctly collected by worker group
+            worker_metrics = DataProto(non_tensor_batch=to_numpy_metrics(metrics))



Here we can return the worker's metrics as DataProto's non_tensor_batch, and utilize the worker group's collect function to allgather different workers' metrics from all workers, thereby avoiding the use of allgather for communication within workers.

non_tensor_batch requires the batch size to be equal to the data's batch size. How can we ensure that the batch size of metrics is the same of the data?

In the workers' update_xxx (update_actor, update_critic), the worker only return metrics without batch data, thus the DataProto don't have batch_size.

Maybe a better solution is to add a generic field in DataProto, maybe with a name like aggregated_data or aux_data, which is not constrained by the batch_size, like meta_info, just a simple Dict[str, np.array]. The difference between this field and meta_info is that we will aggregate this field in worker group's collect_fn. This field can be used to aggregate metrics, and can also be used for aggregating other info. What do you think?

I think create a field of aggreated_data makes sense. If this is the approach, then we may need to first introduce another PR, that

Create the field

Define how the behavior of this field in each DataProto API

Define the dispatch and collect_fn for each registered method

Write a testcase to protect the method

Then, we can use this field to implement auto metrics aggregation. We need to clearly define the aggregation function so that it can be customized.

OK, I will revert this PR to 3d8bf7a and open a separate PR.

…#2259) ### What does this PR do? This PR fix an issue in dp_actor where `actor/kl_loss` and `actor/kl_coef` were being continuously overwritten during the micro-batch processing loop. Previously, the long-lived `metrics` dictionary was updated directly, causing the value for these metrics to reflect only the final micro-batch of any given step, rather than an aggregation of all micro-batches within that step. This change refactors the logic to align the collection of all metrics, now `kl_loss` is collected for each micro-batch, the same as other metrics like `pg_loss`. > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

[actor] fix: wrong kl metrics collection in DataParallelPPOActor

8e57fcc

vermouth1992 reviewed Jun 29, 2025

View reviewed changes

verl/workers/actor/dp_actor.py Show resolved Hide resolved

0x404 added 2 commits June 29, 2025 12:45

[critic] refactor: improve micro-batch metrics collection in DataPara…

c5c8f38

…llelPPOCritic

[critic] fix: ensure critic learning rate scheduler is updated correctly

3d8bf7a

0x404 commented Jun 29, 2025

View reviewed changes

0x404 requested review from PeterSH6, eric-haibin-lin and tongyx361 as code owners June 29, 2025 10:56

0x404 commented Jun 29, 2025

View reviewed changes

0x404 requested a review from zhaochenyang20 as a code owner June 29, 2025 12:43

0x404 force-pushed the kl_fix branch from 76bc716 to 3d8bf7a Compare June 29, 2025 13:30

0x404 requested a review from vermouth1992 June 30, 2025 06:49

vermouth1992 approved these changes Jun 30, 2025

View reviewed changes

vermouth1992 merged commit 6d9ac2f into verl-project:main Jun 30, 2025
47 of 49 checks passed

0x404 mentioned this pull request Jul 1, 2025

[training_utils] feat: impl a metrics class for recording metrics and support worker sync [WIP] #2308

Draft

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[algo] fix: correctly aggregate kl metrics in PPO actor#2259

[algo] fix: correctly aggregate kl metrics in PPO actor#2259
vermouth1992 merged 3 commits intoverl-project:mainfrom
0x404:kl_fix

0x404 commented Jun 29, 2025

Uh oh!

Uh oh!

0x404 Jun 29, 2025

Uh oh!

vermouth1992 commented Jun 29, 2025

Uh oh!

0x404 commented Jun 29, 2025

Uh oh!

vermouth1992 commented Jun 29, 2025

Uh oh!

0x404 Jun 29, 2025 •

edited

Loading

Uh oh!

vermouth1992 Jun 29, 2025

Uh oh!

0x404 Jun 29, 2025

Uh oh!

0x404 Jun 29, 2025

Uh oh!

vermouth1992 Jun 29, 2025

Uh oh!

0x404 Jun 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

0x404 commented Jun 29, 2025

What does this PR do?

Uh oh!

Uh oh!

0x404 Jun 29, 2025

Choose a reason for hiding this comment

Uh oh!

vermouth1992 commented Jun 29, 2025

Uh oh!

0x404 commented Jun 29, 2025

Uh oh!

vermouth1992 commented Jun 29, 2025

Uh oh!

0x404 Jun 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vermouth1992 Jun 29, 2025

Choose a reason for hiding this comment

Uh oh!

0x404 Jun 29, 2025

Choose a reason for hiding this comment

Uh oh!

0x404 Jun 29, 2025

Choose a reason for hiding this comment

Uh oh!

vermouth1992 Jun 29, 2025

Choose a reason for hiding this comment

Uh oh!

0x404 Jun 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

0x404 Jun 29, 2025 •

edited

Loading