Skip to content

[algo] fix: correctly aggregate kl metrics in PPO actor#2259

Merged
vermouth1992 merged 3 commits intoverl-project:mainfrom
0x404:kl_fix
Jun 30, 2025
Merged

[algo] fix: correctly aggregate kl metrics in PPO actor#2259
vermouth1992 merged 3 commits intoverl-project:mainfrom
0x404:kl_fix

Conversation

@0x404
Copy link
Copy Markdown
Collaborator

@0x404 0x404 commented Jun 29, 2025

What does this PR do?

This PR fix an issue in dp_actor where actor/kl_loss and actor/kl_coef were being continuously overwritten during the micro-batch processing loop.

Previously, the long-lived metrics dictionary was updated directly, causing the value for these metrics to reflect only the final micro-batch of any given step, rather than an aggregation of all micro-batches within that step.

This change refactors the logic to align the collection of all metrics, now kl_loss is collected for each micro-batch, the same as other metrics like pg_loss.

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Comment on lines -1126 to +1128
self.critic_lr_scheduler.step()
lr = self.critic_lr_scheduler.get_last_lr()[0]
metrics["critic/lr"] = lr
self.critic_lr_scheduler.step()
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also fix an issue where previous recorded lr is actually the next step's lr instead of the current step. as #1463

@vermouth1992
Copy link
Copy Markdown
Collaborator

One more thing, could you also fix the metrics aggregation in this PR like this #2144? In #2144, DataProto was modified to support this. As a general data structure, this is inappropriate. The reasonable approach is to perform allgather inside the worker, and aggregate on rank zero. Just like what typically torchrun programs did. Do you mind also fix this as well? Thanks!

@0x404
Copy link
Copy Markdown
Collaborator Author

0x404 commented Jun 29, 2025

One more thing, could you also fix the metrics aggregation in this PR like this #2144? In #2144, DataProto was modified to support this. As a general data structure, this is inappropriate. The reasonable approach is to perform allgather inside the worker, and aggregate on rank zero. Just like what typically torchrun programs did. Do you mind also fix this as well? Thanks!

Sure, I would like to fix this. Should I open a separate PR or push that fix into this?

@vermouth1992
Copy link
Copy Markdown
Collaborator

Just push the fix into this PR is fine as it just fixes aggregation problem

# Note: we should convert worker's metrics to DataProto's non_tensor_batch
# so that metrics from different workers can be correctly collected by worker group
worker_metrics = DataProto(non_tensor_batch=to_numpy_metrics(metrics))

Copy link
Copy Markdown
Collaborator Author

@0x404 0x404 Jun 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we can return the worker's metrics as DataProto's non_tensor_batch, and utilize the worker group's collect function to allgather different workers' metrics from all workers, thereby avoiding the use of allgather for communication within workers.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non_tensor_batch requires the batch size to be equal to the data's batch size. How can we ensure that the batch size of metrics is the same of the data?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the workers' update_xxx (update_actor, update_critic), the worker only return metrics without batch data, thus the DataProto don't have batch_size.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a better solution is to add a generic field in DataProto, maybe with a name like aggregated_data or aux_data, which is not constrained by the batch_size, like meta_info, just a simple Dict[str, np.array]. The difference between this field and meta_info is that we will aggregate this field in worker group's collect_fn. This field can be used to aggregate metrics, and can also be used for aggregating other info. What do you think?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think create a field of aggreated_data makes sense. If this is the approach, then we may need to first introduce another PR, that

  1. Create the field
  2. Define how the behavior of this field in each DataProto API
  3. Define the dispatch and collect_fn for each registered method
  4. Write a testcase to protect the method

Then, we can use this field to implement auto metrics aggregation. We need to clearly define the aggregation function so that it can be customized.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will revert this PR to 3d8bf7a and open a separate PR.

@vermouth1992 vermouth1992 merged commit 6d9ac2f into verl-project:main Jun 30, 2025
47 of 49 checks passed
oseyosey pushed a commit to oseyosey/verl that referenced this pull request Jul 28, 2025
…#2259)

### What does this PR do?

This PR fix an issue in dp_actor where `actor/kl_loss` and
`actor/kl_coef` were being continuously overwritten during the
micro-batch processing loop.

Previously, the long-lived `metrics` dictionary was updated directly,
causing the value for these metrics to reflect only the final
micro-batch of any given step, rather than an aggregation of all
micro-batches within that step.

This change refactors the logic to align the collection of all metrics,
now `kl_loss` is collected for each micro-batch, the same as other
metrics like `pg_loss`.


> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
Juniper1021 pushed a commit to Juniper1021/verl that referenced this pull request Aug 7, 2025
…#2259)

### What does this PR do?

This PR fix an issue in dp_actor where `actor/kl_loss` and
`actor/kl_coef` were being continuously overwritten during the
micro-batch processing loop.

Previously, the long-lived `metrics` dictionary was updated directly,
causing the value for these metrics to reflect only the final
micro-batch of any given step, rather than an aggregation of all
micro-batches within that step.

This change refactors the logic to align the collection of all metrics,
now `kl_loss` is collected for each micro-batch, the same as other
metrics like `pg_loss`.


> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
whatadayG pushed a commit to whatadayG/verl that referenced this pull request Sep 5, 2025
…#2259)

### What does this PR do?

This PR fix an issue in dp_actor where `actor/kl_loss` and
`actor/kl_coef` were being continuously overwritten during the
micro-batch processing loop.

Previously, the long-lived `metrics` dictionary was updated directly,
causing the value for these metrics to reflect only the final
micro-batch of any given step, rather than an aggregation of all
micro-batches within that step.

This change refactors the logic to align the collection of all metrics,
now `kl_loss` is collected for each micro-batch, the same as other
metrics like `pg_loss`.


> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
chenjiaoAngel added a commit to chenjiaoAngel/verl that referenced this pull request Nov 14, 2025
…#2259)

### What does this PR do?

This PR fix an issue in dp_actor where `actor/kl_loss` and
`actor/kl_coef` were being continuously overwritten during the
micro-batch processing loop.

Previously, the long-lived `metrics` dictionary was updated directly,
causing the value for these metrics to reflect only the final
micro-batch of any given step, rather than an aggregation of all
micro-batches within that step.

This change refactors the logic to align the collection of all metrics,
now `kl_loss` is collected for each micro-batch, the same as other
metrics like `pg_loss`.


> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
TimurTaepov pushed a commit to giorgossideris/verl that referenced this pull request Dec 20, 2025
…#2259)

### What does this PR do?

This PR fix an issue in dp_actor where `actor/kl_loss` and
`actor/kl_coef` were being continuously overwritten during the
micro-batch processing loop.

Previously, the long-lived `metrics` dictionary was updated directly,
causing the value for these metrics to reflect only the final
micro-batch of any given step, rather than an aggregation of all
micro-batches within that step.

This change refactors the logic to align the collection of all metrics,
now `kl_loss` is collected for each micro-batch, the same as other
metrics like `pg_loss`.


> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
oseyosey pushed a commit to oseyosey/verl that referenced this pull request Jan 20, 2026
…#2259)

### What does this PR do?

This PR fix an issue in dp_actor where `actor/kl_loss` and
`actor/kl_coef` were being continuously overwritten during the
micro-batch processing loop.

Previously, the long-lived `metrics` dictionary was updated directly,
causing the value for these metrics to reflect only the final
micro-batch of any given step, rather than an aggregation of all
micro-batches within that step.

This change refactors the logic to align the collection of all metrics,
now `kl_loss` is collected for each micro-batch, the same as other
metrics like `pg_loss`.


> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
vyomakesh0728 added a commit to vyomakesh0728/verl that referenced this pull request Jan 22, 2026
…#2259)

### What does this PR do?

This PR fix an issue in dp_actor where `actor/kl_loss` and
`actor/kl_coef` were being continuously overwritten during the
micro-batch processing loop.

Previously, the long-lived `metrics` dictionary was updated directly,
causing the value for these metrics to reflect only the final
micro-batch of any given step, rather than an aggregation of all
micro-batches within that step.

This change refactors the logic to align the collection of all metrics,
now `kl_loss` is collected for each micro-batch, the same as other
metrics like `pg_loss`.


> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants