Skip to content

[recipe] feat: integrate DAPO and provide reproduction script#623

Merged
tongyx361 merged 79 commits intomainfrom
gm-tyx/puffin/main
Apr 3, 2025
Merged

[recipe] feat: integrate DAPO and provide reproduction script#623
tongyx361 merged 79 commits intomainfrom
gm-tyx/puffin/main

Conversation

@tongyx361
Copy link
Copy Markdown
Collaborator

@tongyx361 tongyx361 commented Mar 16, 2025

Warning

As mentioned in #623 (comment), the implementation of gradient accumulation in verl has been only compatible with the sequence-mean loss, but all the DAPO experiments with the token-mean loss were run with the incompatible implementation.
We keep it as is for reproducibility in this branch and will fix it in another PR for the main branch.

@tongyx361 tongyx361 requested a review from PeterSH6 March 16, 2025 13:07
@tongyx361 tongyx361 marked this pull request as draft March 16, 2025 13:51
@eric-haibin-lin eric-haibin-lin mentioned this pull request Mar 30, 2025
@tongyx361 tongyx361 marked this pull request as ready for review March 31, 2025 09:53
@tongyx361 tongyx361 requested a review from PeterSH6 March 31, 2025 09:54
@tongyx361 tongyx361 requested a review from PeterSH6 March 31, 2025 22:13
@PeterSH6
Copy link
Copy Markdown
Collaborator

PeterSH6 commented Apr 2, 2025

LGTM!

for sample_idx, data_source in enumerate(data_sources):
prompt = sample_inputs[sample_idx]

var2vals = data_src2prompt2var2vals[data_source][prompt]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the meaning of data_src2prompt2var2vals

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 2 means to, constructing the multi-level dict
  2. var means a kind of "variable" such as acc, final_reward
  3. vals means the "values" of the variables of the trajectories after some prompt from some data_src

I will add comments in a future PR to make it clearer.

@PeterSH6
Copy link
Copy Markdown
Collaborator

PeterSH6 commented Apr 3, 2025

Really nice job!!!

@tongyx361 tongyx361 merged commit 3a27a98 into main Apr 3, 2025
28 checks passed
@tongyx361 tongyx361 deleted the gm-tyx/puffin/main branch April 3, 2025 21:46
@tongyx361 tongyx361 restored the gm-tyx/puffin/main branch April 3, 2025 21:46
@tongyx361 tongyx361 mentioned this pull request Apr 3, 2025
3 tasks
yushengsu-thu pushed a commit to yushengsu-thu/verl that referenced this pull request Apr 4, 2025
…roject#623)

> [!WARNING]
> As mentioned in
verl-project#623 (comment), the
implementation of gradient accumulation in verl has been only compatible
with the sequence-mean loss, but all the DAPO experiments with the
token-mean loss were run with the incompatible implementation.
> **We keep it as is for reproducibility in this branch** and will fix
it in another PR for the main branch.

---------

Co-authored-by: Guangming Sheng <shengguangming@bytedance.com>
Co-authored-by: Guangming Sheng <petershengwhu@gmail.com>
yuchenwang3 pushed a commit to yuchenwang3/verl that referenced this pull request Apr 25, 2025
…roject#623)

> [!WARNING]
> As mentioned in
verl-project#623 (comment), the
implementation of gradient accumulation in verl has been only compatible
with the sequence-mean loss, but all the DAPO experiments with the
token-mean loss were run with the incompatible implementation.
> **We keep it as is for reproducibility in this branch** and will fix
it in another PR for the main branch.

---------

Co-authored-by: Guangming Sheng <shengguangming@bytedance.com>
Co-authored-by: Guangming Sheng <petershengwhu@gmail.com>
histmeisah pushed a commit to SJTU-IAAR/verl that referenced this pull request Apr 27, 2025
…roject#623)

> [!WARNING]
> As mentioned in
verl-project#623 (comment), the
implementation of gradient accumulation in verl has been only compatible
with the sequence-mean loss, but all the DAPO experiments with the
token-mean loss were run with the incompatible implementation.
> **We keep it as is for reproducibility in this branch** and will fix
it in another PR for the main branch.

---------

Co-authored-by: Guangming Sheng <shengguangming@bytedance.com>
Co-authored-by: Guangming Sheng <petershengwhu@gmail.com>
chenjiaoAngel added a commit to chenjiaoAngel/verl that referenced this pull request Nov 14, 2025
…roject#623)

> [!WARNING]
> As mentioned in
verl-project#623 (comment), the
implementation of gradient accumulation in verl has been only compatible
with the sequence-mean loss, but all the DAPO experiments with the
token-mean loss were run with the incompatible implementation.
> **We keep it as is for reproducibility in this branch** and will fix
it in another PR for the main branch.

---------

Co-authored-by: Guangming Sheng <shengguangming@bytedance.com>
Co-authored-by: Guangming Sheng <petershengwhu@gmail.com>
paolo328 added a commit to paolo328/Verl that referenced this pull request Nov 27, 2025
> [!WARNING]
> As mentioned in
verl-project/verl#623 (comment), the
implementation of gradient accumulation in verl has been only compatible
with the sequence-mean loss, but all the DAPO experiments with the
token-mean loss were run with the incompatible implementation.
> **We keep it as is for reproducibility in this branch** and will fix
it in another PR for the main branch.

---------

Co-authored-by: Guangming Sheng <shengguangming@bytedance.com>
Co-authored-by: Guangming Sheng <petershengwhu@gmail.com>
TimurTaepov pushed a commit to giorgossideris/verl that referenced this pull request Dec 20, 2025
…roject#623)

> [!WARNING]
> As mentioned in
verl-project#623 (comment), the
implementation of gradient accumulation in verl has been only compatible
with the sequence-mean loss, but all the DAPO experiments with the
token-mean loss were run with the incompatible implementation.
> **We keep it as is for reproducibility in this branch** and will fix
it in another PR for the main branch.

---------

Co-authored-by: Guangming Sheng <shengguangming@bytedance.com>
Co-authored-by: Guangming Sheng <petershengwhu@gmail.com>
vyomakesh0728 added a commit to vyomakesh0728/verl that referenced this pull request Jan 22, 2026
…roject#623)

> [!WARNING]
> As mentioned in
verl-project#623 (comment), the
implementation of gradient accumulation in verl has been only compatible
with the sequence-mean loss, but all the DAPO experiments with the
token-mean loss were run with the incompatible implementation.
> **We keep it as is for reproducibility in this branch** and will fix
it in another PR for the main branch.

---------

Co-authored-by: Guangming Sheng <shengguangming@bytedance.com>
Co-authored-by: Guangming Sheng <petershengwhu@gmail.com>
dreamyang-liu pushed a commit to dreamyang-liu/verl-sagemaker that referenced this pull request Feb 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants