[recipe, fsdp] feat: support GPT-OSS-20B DAPO training script on ASCEND NPU#4716
[recipe, fsdp] feat: support GPT-OSS-20B DAPO training script on ASCEND NPU#4716mikequan0425 wants to merge 2 commits intoverl-project:mainfrom
Conversation
|
|
There was a problem hiding this comment.
Code Review
This pull request adds a new training script for DAPO with GPT-OSS-20B on Ascend NPUs, along with a documentation update. The changes look good overall. I've found one potential high-severity issue in the new training script where an incorrect advantage estimator might be configured, which could affect the correctness of the training process. My detailed feedback is in the review comment.
| #!/bin/bash | ||
| project_name='gptoss_verl_fsdp' | ||
| exp_name='32rank-gptoss-20B' | ||
| adv_estimator=grpo |
There was a problem hiding this comment.
The script is configured for a DAPO training run, as indicated by the script name, configuration files, and reward manager. However, the advantage estimator is set to grpo. This appears to be inconsistent and likely incorrect for a DAPO recipe. Using a grpo estimator may not align with the DAPO algorithm, potentially leading to incorrect training behavior. It should be changed to the appropriate estimator for DAPO, which is presumably dapo.
| adv_estimator=grpo | |
| adv_estimator=dapo |
What does this PR do?
Provide an script for DAPO-training GPT-OSS-20B on NPU
Checklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI){modules}includefsdp,megatron,sglang,vllm,rollout,trainer,ci,training_utils,recipe,hardware,deployment,ray,worker,single_controller,misc,perf,model,algo,env,tool,ckpt,doc,data,cfg,reward,like[megatron, fsdp, doc]{type}is infeat,fix,refactor,chore,test[BREAKING]to the beginning of the title.[BREAKING][fsdp, megatron] feat: dynamic batchingTest
API and Usage Example
# Add code snippet or script demonstrating how to use thisDesign & Code Changes
Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysci-requestchannel in theverlSlack workspace. (If not accessible, please try the Feishu group (飞书群).)