fix: gradient accumulation in DP by tongyx361 · Pull Request #906 · verl-project/verl

tongyx361 · 2025-04-03T22:47:44Z

Motivation

Gradient accumulation should ensure the loss after applying it is the same as not applying it. But verl's original implementation is only compatible with sequence-mean loss:

mini_loss_to_acc = micro_agg_loss * (len(micro_batch) / self.config.ppo_mini_batch_size)

while verl used to use token-mean loss by default.

For more background, please refer to:

Related Issue/Comment(s)

#623 (comment)

Summary

This PR fixes the mismatch between w/ & w/o gradient accumulation by adapting to the loss aggregation mode.

Core Code to Review

Calculate the number of loss tokens in every mini-batch (mini_batch_loss_token_nums):

def calc_mini_batch_loss_token_nums(batch: DataProto, traj_mini_bsz: int, num_dp_ranks: int) -> list[int]:
    """
    NOTE: Be compatible with
    
    1. verl.workers.fsdp_workers.ActorRolloutRefWorker.update_actor
    2. verl.workers.fsdp_workers.CriticWorker.update_critic

    TODO: Calculate separate numbers if adopting different strategies for actor and critic
    """
    response_mask = compute_response_mask(response_ids=batch.batch['responses'],
                                          attention_mask=batch.batch['attention_mask'])

    traj_bsz = len(batch.batch)
    num_mini_batches = (traj_bsz + traj_mini_bsz - 1) // traj_mini_bsz
    traj_mini_bsz_per_rank = traj_mini_bsz // num_dp_ranks

    mini_batch_loss_token_nums = []
    for _ in range(num_mini_batches):
        mini_batch_traj_idxs = []
        for dp_rank in range(num_dp_ranks):
            start_traj_idx = int(traj_bsz / num_dp_ranks * dp_rank)
            next_start_traj_idx = int(traj_bsz / num_dp_ranks * (dp_rank + 1))
            end_traj_idx = int(min(start_traj_idx + traj_mini_bsz_per_rank, next_start_traj_idx))
            mini_batch_traj_idxs.extend(list(range(start_traj_idx, end_traj_idx)))
        mini_batch_resp_mask = response_mask[mini_batch_traj_idxs]
        mini_batch_loss_token_num = mini_batch_resp_mask.sum()
        mini_batch_loss_token_nums.append(mini_batch_loss_token_num)

    return mini_batch_loss_token_nums

Reweight the micro-batch-aggregated loss (micro_agg_loss) adaptive to the loss aggregation mode (loss_agg_mode) to get this micro-batch's contribution to accumulate for the mini-batch-aggregated loss (mini_loss_to_acc):

if self.config.loss_agg_mode == 'token-mean':
    mini_batch_loss_token_nums = data.meta_info['mini_batch_loss_token_nums']
    mini_batch_loss_token_num = mini_batch_loss_token_nums[mini_idx]
    num_valid_toks = response_mask.sum()
    mini_loss_to_acc = micro_agg_loss * num_valid_toks / mini_batch_loss_token_num
else:  # seq-mean
    mini_loss_to_acc = micro_agg_loss * (len(micro_data_chunk) / self.config.ppo_mini_batch_size)

Checklist

(No new config term) If this PR adds any new config term, modify the docs accordingly.
(No breaking) If this PR breaks the config/CLI, add [BREAKING] to the title.
(Leaving to a future PR for [FR] Tests for gradient accumulation #907) If possible, add corresponding tests to .github/workflows.

eric-haibin-lin

could u summarize what the issue was and what's the impact to existing users?

This reverts commit 823c029.

tongyx361 · 2025-04-30T20:22:41Z

Splitting into PRs:

Variable name: [refactor] feat: separate data, batch and metric with clear variable names #1339
loss_agg_mode: fix: add loss_agg_mode to critics #1340
Fixing gradient accumulation in DP
Test the fix

fix: gradient accumulation in DP

b8693c3

tongyx361 requested review from PeterSH6, eric-haibin-lin, hiyouga and vermouth1992 and removed request for vermouth1992 April 3, 2025 22:47

eric-haibin-lin reviewed Apr 3, 2025

View reviewed changes

fix: indent

8b2bdbc

tongyx361 requested a review from eric-haibin-lin April 3, 2025 23:13

tongyx361 marked this pull request as draft April 3, 2025 23:37

tongyx361 mentioned this pull request Apr 3, 2025

[FR] Tests for gradient accumulation #907

Open

tongyx361 added 13 commits April 4, 2025 00:55

fix: gradient accumulation for seq-mean loss

a75670d

fix: test_seqlen_balancing

b6bd94a

fix: get_uniform_data_chunks

36ec375

fix: get_uniform_data_chunks

4996441

fix: calc_mini_batch_loss_token_nums

ef5f57b

fix: ppo_mini_batch_size

2f75d45

fix: micro_data_chunks

01d4eb1

fix: dp_size

823c029

Revert "fix: dp_size"

9f3b961

This reverts commit 823c029.

fix: _forward_micro_batch in recipe/prime/prime_dp_rm.py

4e1c873

fix: CI

7f9c84f

feat: docstring for calc_mini_batch_loss_token_nums

8cea267

Merge branch 'main' into tyx/fix/grad-acc-dp

a68003b

tongyx361 closed this Apr 10, 2025

tongyx361 reopened this Apr 10, 2025

tongyx361 mentioned this pull request Apr 13, 2025

about grpo loss computation #593

Closed

tongyx361 added 2 commits May 1, 2025 03:07

Merge branch 'main' into tyx/fix/grad-acc-dp

cc62eca

fix: num_loss_toks

d58675b

feat: tests

e34582b

tongyx361 closed this Apr 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: gradient accumulation in DP#906

fix: gradient accumulation in DP#906
tongyx361 wants to merge 18 commits intoverl-project:mainfrom
tongyx361:tyx/fix/grad-acc-dp

tongyx361 commented Apr 3, 2025 •

edited

Loading

Uh oh!

eric-haibin-lin left a comment

Uh oh!

tongyx361 commented Apr 30, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tongyx361 commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Related Issue/Comment(s)

Summary

Core Code to Review

Checklist

Uh oh!

eric-haibin-lin left a comment

Choose a reason for hiding this comment

Uh oh!

tongyx361 commented Apr 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tongyx361 commented Apr 3, 2025 •

edited

Loading

tongyx361 commented Apr 30, 2025 •

edited

Loading