[algo] fix: correctly aggregate kl metrics in PPO actor#2259
[algo] fix: correctly aggregate kl metrics in PPO actor#2259vermouth1992 merged 3 commits intoverl-project:mainfrom
Conversation
| self.critic_lr_scheduler.step() | ||
| lr = self.critic_lr_scheduler.get_last_lr()[0] | ||
| metrics["critic/lr"] = lr | ||
| self.critic_lr_scheduler.step() |
There was a problem hiding this comment.
also fix an issue where previous recorded lr is actually the next step's lr instead of the current step. as #1463
|
One more thing, could you also fix the metrics aggregation in this PR like this #2144? In #2144, DataProto was modified to support this. As a general data structure, this is inappropriate. The reasonable approach is to perform allgather inside the worker, and aggregate on rank zero. Just like what typically torchrun programs did. Do you mind also fix this as well? Thanks! |
Sure, I would like to fix this. Should I open a separate PR or push that fix into this? |
|
Just push the fix into this PR is fine as it just fixes aggregation problem |
| # Note: we should convert worker's metrics to DataProto's non_tensor_batch | ||
| # so that metrics from different workers can be correctly collected by worker group | ||
| worker_metrics = DataProto(non_tensor_batch=to_numpy_metrics(metrics)) | ||
|
|
There was a problem hiding this comment.
Here we can return the worker's metrics as DataProto's non_tensor_batch, and utilize the worker group's collect function to allgather different workers' metrics from all workers, thereby avoiding the use of allgather for communication within workers.
There was a problem hiding this comment.
non_tensor_batch requires the batch size to be equal to the data's batch size. How can we ensure that the batch size of metrics is the same of the data?
There was a problem hiding this comment.
In the workers' update_xxx (update_actor, update_critic), the worker only return metrics without batch data, thus the DataProto don't have batch_size.
There was a problem hiding this comment.
Maybe a better solution is to add a generic field in DataProto, maybe with a name like aggregated_data or aux_data, which is not constrained by the batch_size, like meta_info, just a simple Dict[str, np.array]. The difference between this field and meta_info is that we will aggregate this field in worker group's collect_fn. This field can be used to aggregate metrics, and can also be used for aggregating other info. What do you think?
There was a problem hiding this comment.
I think create a field of aggreated_data makes sense. If this is the approach, then we may need to first introduce another PR, that
- Create the field
- Define how the behavior of this field in each DataProto API
- Define the dispatch and collect_fn for each registered method
- Write a testcase to protect the method
Then, we can use this field to implement auto metrics aggregation. We need to clearly define the aggregation function so that it can be customized.
There was a problem hiding this comment.
OK, I will revert this PR to 3d8bf7a and open a separate PR.
…#2259) ### What does this PR do? This PR fix an issue in dp_actor where `actor/kl_loss` and `actor/kl_coef` were being continuously overwritten during the micro-batch processing loop. Previously, the long-lived `metrics` dictionary was updated directly, causing the value for these metrics to reflect only the final micro-batch of any given step, rather than an aggregation of all micro-batches within that step. This change refactors the logic to align the collection of all metrics, now `kl_loss` is collected for each micro-batch, the same as other metrics like `pg_loss`. > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
…#2259) ### What does this PR do? This PR fix an issue in dp_actor where `actor/kl_loss` and `actor/kl_coef` were being continuously overwritten during the micro-batch processing loop. Previously, the long-lived `metrics` dictionary was updated directly, causing the value for these metrics to reflect only the final micro-batch of any given step, rather than an aggregation of all micro-batches within that step. This change refactors the logic to align the collection of all metrics, now `kl_loss` is collected for each micro-batch, the same as other metrics like `pg_loss`. > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
…#2259) ### What does this PR do? This PR fix an issue in dp_actor where `actor/kl_loss` and `actor/kl_coef` were being continuously overwritten during the micro-batch processing loop. Previously, the long-lived `metrics` dictionary was updated directly, causing the value for these metrics to reflect only the final micro-batch of any given step, rather than an aggregation of all micro-batches within that step. This change refactors the logic to align the collection of all metrics, now `kl_loss` is collected for each micro-batch, the same as other metrics like `pg_loss`. > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
…#2259) ### What does this PR do? This PR fix an issue in dp_actor where `actor/kl_loss` and `actor/kl_coef` were being continuously overwritten during the micro-batch processing loop. Previously, the long-lived `metrics` dictionary was updated directly, causing the value for these metrics to reflect only the final micro-batch of any given step, rather than an aggregation of all micro-batches within that step. This change refactors the logic to align the collection of all metrics, now `kl_loss` is collected for each micro-batch, the same as other metrics like `pg_loss`. > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
…#2259) ### What does this PR do? This PR fix an issue in dp_actor where `actor/kl_loss` and `actor/kl_coef` were being continuously overwritten during the micro-batch processing loop. Previously, the long-lived `metrics` dictionary was updated directly, causing the value for these metrics to reflect only the final micro-batch of any given step, rather than an aggregation of all micro-batches within that step. This change refactors the logic to align the collection of all metrics, now `kl_loss` is collected for each micro-batch, the same as other metrics like `pg_loss`. > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
…#2259) ### What does this PR do? This PR fix an issue in dp_actor where `actor/kl_loss` and `actor/kl_coef` were being continuously overwritten during the micro-batch processing loop. Previously, the long-lived `metrics` dictionary was updated directly, causing the value for these metrics to reflect only the final micro-batch of any given step, rather than an aggregation of all micro-batches within that step. This change refactors the logic to align the collection of all metrics, now `kl_loss` is collected for each micro-batch, the same as other metrics like `pg_loss`. > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
…#2259) ### What does this PR do? This PR fix an issue in dp_actor where `actor/kl_loss` and `actor/kl_coef` were being continuously overwritten during the micro-batch processing loop. Previously, the long-lived `metrics` dictionary was updated directly, causing the value for these metrics to reflect only the final micro-batch of any given step, rather than an aggregation of all micro-batches within that step. This change refactors the logic to align the collection of all metrics, now `kl_loss` is collected for each micro-batch, the same as other metrics like `pg_loss`. > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
What does this PR do?
This PR fix an issue in dp_actor where
actor/kl_lossandactor/kl_coefwere being continuously overwritten during the micro-batch processing loop.Previously, the long-lived
metricsdictionary was updated directly, causing the value for these metrics to reflect only the final micro-batch of any given step, rather than an aggregation of all micro-batches within that step.This change refactors the logic to align the collection of all metrics, now
kl_lossis collected for each micro-batch, the same as other metrics likepg_loss.Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysci-requestchannel in theverlSlack workspace.