[recipe] feat: integrate DAPO and provide reproduction script by tongyx361 · Pull Request #623 · verl-project/verl

tongyx361 · 2025-03-16T13:01:14Z

Warning

As mentioned in #623 (comment), the implementation of gradient accumulation in verl has been only compatible with the sequence-mean loss, but all the DAPO experiments with the token-mean loss were run with the incompatible implementation.
We keep it as is for reproducibility in this branch and will fix it in another PR for the main branch.

- As titled

Co-authored-by: Guangming Sheng <petershengwhu@gmail.com>

…e for low and high (#618) - As titled

verl/protocol.py

verl/trainer/config/ppo_trainer.yaml

verl/trainer/ppo/metric_utils.py

verl/workers/reward_manager/naive.py

verl/trainer/ppo/ray_trainer.py

PeterSH6 · 2025-04-02T08:26:45Z

LGTM!

PeterSH6 · 2025-03-31T12:54:21Z

verl/trainer/ppo/ray_trainer.py

+        for sample_idx, data_source in enumerate(data_sources):
+            prompt = sample_inputs[sample_idx]
+
+            var2vals = data_src2prompt2var2vals[data_source][prompt]


What's the meaning of data_src2prompt2var2vals

2 means to, constructing the multi-level dict

var means a kind of "variable" such as acc, final_reward

vals means the "values" of the variables of the trajectories after some prompt from some data_src

I will add comments in a future PR to make it clearer.

PeterSH6 · 2025-04-03T19:52:50Z

Really nice job!!!

…roject#623) > [!WARNING] > As mentioned in verl-project#623 (comment), the implementation of gradient accumulation in verl has been only compatible with the sequence-mean loss, but all the DAPO experiments with the token-mean loss were run with the incompatible implementation. > **We keep it as is for reproducibility in this branch** and will fix it in another PR for the main branch. --------- Co-authored-by: Guangming Sheng <shengguangming@bytedance.com> Co-authored-by: Guangming Sheng <petershengwhu@gmail.com>

> [!WARNING] > As mentioned in verl-project/verl#623 (comment), the implementation of gradient accumulation in verl has been only compatible with the sequence-mean loss, but all the DAPO experiments with the token-mean loss were run with the incompatible implementation. > **We keep it as is for reproducibility in this branch** and will fix it in another PR for the main branch. --------- Co-authored-by: Guangming Sheng <shengguangming@bytedance.com> Co-authored-by: Guangming Sheng <petershengwhu@gmail.com>

…roject#623) > [!WARNING] > As mentioned in verl-project#623 (comment), the implementation of gradient accumulation in verl has been only compatible with the sequence-mean loss, but all the DAPO experiments with the token-mean loss were run with the incompatible implementation. > **We keep it as is for reproducibility in this branch** and will fix it in another PR for the main branch. --------- Co-authored-by: Guangming Sheng <shengguangming@bytedance.com> Co-authored-by: Guangming Sheng <petershengwhu@gmail.com>

PeterSH6 and others added 8 commits March 15, 2025 21:23

[misc] feat: add weight decay option in config (#611)

f39fae6

- As titled

[misc] feat: verifier for Puffin (#612)

25b643a

Co-authored-by: Guangming Sheng <petershengwhu@gmail.com>

fix: n = 1 with duplicated data

6e594fe

fix: config actor weight decay

6aefcdc

[misc] feat: support token_level_loss and support different clip rang…

3a7a0e8

…e for low and high (#618) - As titled

[log] feat: more statistics in validation (#620)

198af7c

chore: format

3ca12f9

fix: use train_batch_size by default

e7e888e

tongyx361 requested a review from PeterSH6 March 16, 2025 13:07

tongyx361 added 3 commits March 16, 2025 13:22

fix: return_dict

2a6aa9e

feat: script for Puffin-Zero-Qwen2.5-32B

fc1d1fc

fix: extra_reward_info

54acfd0

tongyx361 marked this pull request as draft March 16, 2025 13:51

tongyx361 added 8 commits March 16, 2025 14:00

chore: select_idxs

c181511

chore: fill_to_train_bsz

f3c513e

fix: naive.py

7ff4243

chore: rename 32B script

c16e82f

fix: reward dict result

92fd507

chore: non_uniform_reward

6ddea1c

fix: train_prompt_bsz

4d077f5

chore: filter prefix

eff7f61

PeterSH6 reviewed Mar 16, 2025

View reviewed changes

tongyx361 and others added 8 commits March 16, 2025 14:58

chore: format

a0050c9

fix 32b no filter script

6075b73

feat: config overlong_buffer

38a9fbe

chore: comments

8e64972

fix: megatron config

50109f1

fix: 32B script

3f2f815

fix: scripts

a32aac0

feat: rename [skip ci]

d8e0a73

eric-haibin-lin mentioned this pull request Mar 30, 2025

DAPO #823

Closed

tongyx361 added 2 commits March 31, 2025 03:52

Merge branch 'main' into gm-tyx/puffin/main

89da9db

fix: config

17f7b4b

tongyx361 marked this pull request as ready for review March 31, 2025 09:53

tongyx361 requested a review from PeterSH6 March 31, 2025 09:54

PeterSH6 reviewed Mar 31, 2025

View reviewed changes

verl/trainer/ppo/ray_trainer.py Outdated Show resolved Hide resolved

tongyx361 added 3 commits March 31, 2025 14:15

feat: better metric sectioning

da7f9b9

refactor: extract process_validation_metrics

db456cc

fix: reward_tensor

4bbc0ab

tongyx361 requested a review from PeterSH6 March 31, 2025 22:13

tongyx361 added 5 commits April 2, 2025 09:06

Merge branch 'main' into gm-tyx/puffin/main

50b3fe0

fix: DAPO config

7bd84a7

fix: new features

404a38b

Merge branch 'main' into gm-tyx/puffin/main

6ba61e1

Merge branch 'main' into gm-tyx/puffin/main

a11ced5

PeterSH6 approved these changes Apr 3, 2025

View reviewed changes

tongyx361 merged commit 3a27a98 into main Apr 3, 2025
28 checks passed

tongyx361 deleted the gm-tyx/puffin/main branch April 3, 2025 21:46

tongyx361 restored the gm-tyx/puffin/main branch April 3, 2025 21:46

tongyx361 mentioned this pull request Apr 3, 2025

fix: gradient accumulation in DP #906

Closed

3 tasks

dreamyang-liu pushed a commit to dreamyang-liu/verl-sagemaker that referenced this pull request Feb 21, 2026

Doc: More model examples (verl-project#623)

3694c55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[recipe] feat: integrate DAPO and provide reproduction script#623

[recipe] feat: integrate DAPO and provide reproduction script#623
tongyx361 merged 79 commits intomainfrom
gm-tyx/puffin/main

tongyx361 commented Mar 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PeterSH6 commented Apr 2, 2025

Uh oh!

PeterSH6 Mar 31, 2025

Uh oh!

tongyx361 Apr 3, 2025

Uh oh!

PeterSH6 commented Apr 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

tongyx361 commented Mar 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PeterSH6 commented Apr 2, 2025

Uh oh!

PeterSH6 Mar 31, 2025

Choose a reason for hiding this comment

Uh oh!

tongyx361 Apr 3, 2025

Choose a reason for hiding this comment

Uh oh!

PeterSH6 commented Apr 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

tongyx361 commented Mar 16, 2025 •

edited

Loading