-
Notifications
You must be signed in to change notification settings - Fork 4.1k
[sglang] feat: adapt for sglang+verl #3506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
848680b
adaptor verl sglang
lbk-sys 93cd378
fix for commit
lbk-sys 6868943
fix for scripts
lbk-sys b0f677e
fix for commit
lbk-sys 99f5227
Merge remote-tracking branch 'verl/main' into verl_sglang_0915
lbk-sys de1717a
fix for ci
lbk-sys 616e305
fix for ci
lbk-sys b8e60cb
fix for ci
lbk-sys 5082a69
fix for ci
lbk-sys File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,113 @@ | ||
| verl x Ascend | ||
| =================================== | ||
|
|
||
| Last updated: 09/25/2025. | ||
|
|
||
| 我们在 verl 上增加对华为昇腾设备的支持。 | ||
|
|
||
| 硬件支持 | ||
| ----------------------------------- | ||
|
|
||
| Atlas 200T A2 Box16 | ||
|
|
||
| Atlas 900 A2 PODc | ||
|
|
||
| Atlas 800T A3 | ||
|
|
||
|
|
||
| 安装 | ||
| ----------------------------------- | ||
|
|
||
| 基础环境准备 | ||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
|
||
| +-----------+-------------+ | ||
| | software | version | | ||
| +-----------+-------------+ | ||
| | Python | == 3.11 | | ||
| +-----------+-------------+ | ||
| | CANN | == 8.3.RC1 | | ||
| +-----------+-------------+ | ||
| | HDK | == 25.3.RC1 | | ||
| +-----------+-------------+ | ||
| | torch | == 2.6.0 | | ||
| +-----------+-------------+ | ||
| | torch_npu | == 2.6.0 | | ||
| +-----------+-------------+ | ||
|
|
||
| **目前verl框架中sglang npu后端仅支持上述HDK、CANN和PTA版本, 商发可用版本预计2025年10月发布** | ||
|
|
||
| 为了能够在 verl 中正常使用 sglang,需使用以下命令安装sglang、torch_memory_saver和verl。 | ||
|
|
||
| sglang | ||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
| .. code-block:: bash | ||
|
|
||
| # sglang | ||
| git clone https://github.com/sgl-project/sglang.git | ||
| cd sglang | ||
| mv python/pyproject.toml python/pyproject.toml.backup | ||
| mv python/pyproject_other.toml python/pyproject.toml | ||
| pip install -e "python[srt_npu]" | ||
|
|
||
| 安装torch_memory_saver | ||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
| .. code-block:: bash | ||
|
|
||
| # torch_memory_saver | ||
| git clone https://github.com/sgl-project/sgl-kernel-npu.git | ||
| cd sgl-kernel-npu | ||
| bash build.sh -a memory-saver | ||
| pip install output/torch_memory_saver*.whl | ||
|
|
||
| 安装verl | ||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
|
||
| .. code-block:: bash | ||
|
|
||
| git clone https://github.com/volcengine/verl.git | ||
| cd verl | ||
| pip install --no-deps -e . | ||
| pip install -r requirements-npu.txt | ||
|
|
||
|
|
||
| 其他三方库说明 | ||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
|
||
| +--------------+---------------+ | ||
| | software | description | | ||
| +--------------+---------------+ | ||
| | transformers | v4.56.1 | | ||
| +--------------+---------------+ | ||
| | triton_ascend| v3.2.0 | | ||
| +--------------+---------------+ | ||
|
|
||
| 1. sglang依赖 transformers v4.56.1 | ||
| 2. sglang依赖triton_ascend v3.2.0 | ||
| 3. 暂不支持多模态模型,卸载相关安装包torchvision、timm | ||
|
|
||
| .. code-block:: bash | ||
|
|
||
| pip uninstall torchvision | ||
| pip uninstall timm | ||
| pip uninstall triton | ||
|
|
||
| pip install transformers==4.56.1 | ||
| pip install -i https://test.pypi.org/simple/ triton-ascend==3.2.0.dev20250925 | ||
|
|
||
|
|
||
| 快速开始 | ||
| ----------------------------------- | ||
| 正式使用前,建议您通过对Qwen3-8B GRPO的训练尝试以检验环境准备和安装的正确性。 | ||
|
|
||
| 1.下载数据集并将数据集预处理为parquet格式,以便包含计算RL奖励所需的必要字段 | ||
|
|
||
| .. code-block:: bash | ||
|
|
||
| python3 examples/data_preprocess/gsm8k.py --local_save_dir ~/data/gsm8k | ||
|
|
||
| 2.执行训练 | ||
|
|
||
| .. code-block:: bash | ||
|
|
||
| bash verl/examples/grpo_trainer/run_qwen3_8b_grpo_sglang_1k_npu.sh |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
71 changes: 71 additions & 0 deletions
71
examples/grpo_trainer/run_qwen3_8b_grpo_sglang_1k_spmd_npu.sh
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,71 @@ | ||
| set -x | ||
| export HCCL_CONNECT_TIMEOUT=1500 | ||
| export HCCL_HOST_SOCKET_PORT_RANGE=60000-60050 | ||
| export HCCL_NPU_SOCKET_PORT_RANGE=61000-61050 | ||
|
|
||
| # WORKSPACE_HOME and DATA_HOME support custom path configuration. | ||
| WORKSPACE_HOME=$pwd | ||
| DATA_HOME=$pwd | ||
|
|
||
| sp_size=4 | ||
| num_npu=4 | ||
| tp_size=4 | ||
| train_prompt_bsz=16 | ||
| train_prompt_mini_bsz=16 | ||
|
|
||
| max_prompt_length=512 | ||
| max_response_length=1024 | ||
|
|
||
| CKPTS_DIR=$WORKSPACE_HOME/logs/ckpt/qwen3_8b | ||
| model_path=$DATA_HOME/models/Qwen3-8B | ||
| train_data=$DATA_HOME/datasets/processed_gsm8k/train.parquet | ||
| valid_data=$DATA_HOME/datasets/processed_gsm8k/test.parquet | ||
|
|
||
| python3 -m verl.trainer.main_ppo \ | ||
| algorithm.adv_estimator=grpo \ | ||
| data.train_files=$train_data \ | ||
| data.val_files=$valid_data \ | ||
| data.train_batch_size=$train_prompt_bsz \ | ||
| data.max_prompt_length=$max_prompt_length \ | ||
| data.max_response_length=$max_response_length \ | ||
| data.filter_overlong_prompts=True \ | ||
| data.truncation='error' \ | ||
| actor_rollout_ref.model.path=$model_path \ | ||
| actor_rollout_ref.actor.optim.lr=1e-6 \ | ||
| actor_rollout_ref.model.use_remove_padding=True \ | ||
| actor_rollout_ref.actor.ppo_mini_batch_size=$train_prompt_mini_bsz \ | ||
| actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=1 \ | ||
| actor_rollout_ref.actor.use_kl_loss=True \ | ||
| actor_rollout_ref.actor.entropy_coeff=0 \ | ||
| actor_rollout_ref.actor.kl_loss_coef=0.001 \ | ||
| actor_rollout_ref.actor.kl_loss_type=low_var_kl \ | ||
| actor_rollout_ref.actor.use_torch_compile=False \ | ||
| actor_rollout_ref.model.enable_gradient_checkpointing=True \ | ||
| actor_rollout_ref.actor.fsdp_config.param_offload=True \ | ||
| actor_rollout_ref.actor.fsdp_config.optimizer_offload=True \ | ||
| actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=2 \ | ||
| actor_rollout_ref.rollout.tensor_model_parallel_size=$tp_size \ | ||
| actor_rollout_ref.rollout.name=sglang \ | ||
| actor_rollout_ref.rollout.gpu_memory_utilization=0.3 \ | ||
| actor_rollout_ref.rollout.n=5 \ | ||
| +actor_rollout_ref.rollout.engine_kwargs.sglang.attention_backend="ascend" \ | ||
|
FightingZhen marked this conversation as resolved.
|
||
| actor_rollout_ref.ref.fsdp_config.param_offload=True \ | ||
| actor_rollout_ref.rollout.enable_chunked_prefill=False \ | ||
| actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=2 \ | ||
| actor_rollout_ref.ref.fsdp_config.param_offload=True \ | ||
| actor_rollout_ref.nccl_timeout=1800 \ | ||
| algorithm.use_kl_in_reward=False \ | ||
| trainer.critic_warmup=0 \ | ||
| trainer.logger=console \ | ||
| trainer.val_before_train=False \ | ||
| trainer.project_name='verl_grpo_example_512_1024_gsm8k' \ | ||
| trainer.experiment_name='qwen3_8b_function_rm' \ | ||
| trainer.n_gpus_per_node=$num_npu \ | ||
| trainer.nnodes=1 \ | ||
| trainer.save_freq=1000 \ | ||
| trainer.test_freq=10000 \ | ||
| trainer.total_epochs=5 \ | ||
| trainer.default_local_dir="${CKPTS_DIR}" \ | ||
| actor_rollout_ref.actor.ulysses_sequence_parallel_size=${sp_size} \ | ||
| actor_rollout_ref.ref.ulysses_sequence_parallel_size=${sp_size} \ | ||
| trainer.device=npu $@ | ||
71 changes: 71 additions & 0 deletions
71
examples/grpo_trainer/run_qwen3_8b_grpo_sglang_32k_spmd_npu.sh
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,71 @@ | ||
| set -x | ||
| export HCCL_CONNECT_TIMEOUT=1500 | ||
| export HCCL_HOST_SOCKET_PORT_RANGE=60000-60050 | ||
| export HCCL_NPU_SOCKET_PORT_RANGE=61000-61050 | ||
|
|
||
| # WORKSPACE_HOME and DATA_HOME support custom path configuration. | ||
| WORKSPACE_HOME=$pwd | ||
| DATA_HOME=$pwd | ||
|
|
||
| sp_size=4 | ||
| num_gpu=8 | ||
| tp_size=4 | ||
| train_prompt_bsz=16 | ||
| train_prompt_mini_bsz=16 | ||
|
|
||
| max_prompt_length=$((1024 * 2)) | ||
| max_response_length=$((1024 * 32)) | ||
|
|
||
| CKPTS_DIR=$WORKSPACE_HOME/logs/ckpt/qwen3_8b | ||
| model_path=$DATA_HOME/models/Qwen3-8B | ||
| train_data=$DATA_HOME/datasets/dapo/dapo-math-17k.parquet | ||
| valid_data=$DATA_HOME/datasets/dapo/aime-2024.parquet | ||
|
|
||
| python3 -m verl.trainer.main_ppo \ | ||
| algorithm.adv_estimator=grpo \ | ||
| data.train_files=$train_data \ | ||
| data.val_files=$valid_data \ | ||
| data.train_batch_size=$train_prompt_bsz \ | ||
| data.max_prompt_length=$max_prompt_length \ | ||
| data.max_response_length=$max_response_length \ | ||
| data.filter_overlong_prompts=False \ | ||
| data.truncation='error' \ | ||
| actor_rollout_ref.model.path=$model_path \ | ||
| actor_rollout_ref.actor.optim.lr=1e-6 \ | ||
| actor_rollout_ref.model.use_remove_padding=True \ | ||
| actor_rollout_ref.actor.ppo_mini_batch_size=$train_prompt_mini_bsz \ | ||
| actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=1 \ | ||
| actor_rollout_ref.actor.use_kl_loss=True \ | ||
| actor_rollout_ref.actor.entropy_coeff=0 \ | ||
| actor_rollout_ref.actor.kl_loss_coef=0.001 \ | ||
| actor_rollout_ref.actor.kl_loss_type=low_var_kl \ | ||
| actor_rollout_ref.actor.use_torch_compile=False \ | ||
| actor_rollout_ref.model.enable_gradient_checkpointing=True \ | ||
| actor_rollout_ref.actor.fsdp_config.param_offload=True \ | ||
| actor_rollout_ref.actor.fsdp_config.optimizer_offload=True \ | ||
| actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=1 \ | ||
| actor_rollout_ref.rollout.tensor_model_parallel_size=$tp_size \ | ||
| actor_rollout_ref.rollout.name=sglang \ | ||
| actor_rollout_ref.rollout.gpu_memory_utilization=0.3 \ | ||
| actor_rollout_ref.rollout.n=5 \ | ||
| +actor_rollout_ref.rollout.engine_kwargs.sglang.attention_backend="ascend" \ | ||
|
lbk-sys marked this conversation as resolved.
|
||
| actor_rollout_ref.ref.fsdp_config.param_offload=True \ | ||
| actor_rollout_ref.rollout.enable_chunked_prefill=False \ | ||
| actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=1 \ | ||
| actor_rollout_ref.ref.fsdp_config.param_offload=True \ | ||
| actor_rollout_ref.nccl_timeout=3600 \ | ||
| algorithm.use_kl_in_reward=False \ | ||
| trainer.critic_warmup=0 \ | ||
| trainer.logger=console \ | ||
| trainer.val_before_train=False \ | ||
| trainer.project_name='verl_grpo_example_2k_32k' \ | ||
| trainer.experiment_name='qwen3_8b_function_rm' \ | ||
| trainer.n_gpus_per_node=$num_gpu \ | ||
| trainer.nnodes=1 \ | ||
| trainer.save_freq=1000 \ | ||
| trainer.test_freq=10000 \ | ||
| trainer.total_epochs=5 \ | ||
| trainer.default_local_dir="${CKPTS_DIR}" \ | ||
| actor_rollout_ref.actor.ulysses_sequence_parallel_size=${sp_size} \ | ||
| actor_rollout_ref.ref.ulysses_sequence_parallel_size=${sp_size} \ | ||
| trainer.device=npu $@ | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.