Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
2c179da
Add explicit position_ids to model.generate in hf rollout (#1637)
Geaming2002 May 23, 2025
54a5e6e
[megatron] feat: save hf model config in megatron checkpoint manager …
0x404 May 23, 2025
aaaaaab
Activation Offloading (#1220)
imh966 May 23, 2025
9ddc725
fix: add `loss_agg_mode` to critics (#1340)
tongyx361 May 23, 2025
cdee00d
fix: only load reference policy when needed in DAPO (#1651)
tongyx361 May 23, 2025
c4faf5c
[CI] feat: add ignore for CI of SPIN & SPPO (#1653)
tongyx361 May 23, 2025
a7b2e29
fix: entropy in DAPO (#1652)
tongyx361 May 23, 2025
0528ba1
[NPU] feat: Support FSDP worker and vLLM Ascend (#332)
sunyi0505 May 23, 2025
96c181a
chore(ci): support FSDP2 for multi-turn SGLangRollout with tool calli…
zyzshishui May 23, 2025
7225544
[SGLang Async Rollout] Validate prompt_len + max_resp_len <= max_mode…
jybsuper May 24, 2025
0286210
[Megatron] Support optimizer offload for moe when ep > 1 (#1638)
zzong2006 May 24, 2025
4779f26
[Refactor] fused kernel in forward (#1624)
mingruimingrui May 24, 2025
5dc6439
[CI] fix: DAPO CI & response_mask (#1666)
tongyx361 May 24, 2025
3c048ac
modify the instructions for using verl on ASCEND NPU (#1670)
sunyi0505 May 24, 2025
69582dc
Add verl-agent and GiGPO to the awesome work list (#1660)
langfengQ May 24, 2025
cf731e8
[sglang] Fix megatron support in sglang and add sglang_async support …
SwordFaith May 24, 2025
7d26d73
modify the installation method of vllm on different architectures and…
sunyi0505 May 24, 2025
4532308
[misc] fix: fix megatron entropy (#1672)
vermouth1992 May 24, 2025
c60546d
[misc] fix: fix device (#1671)
vermouth1992 May 24, 2025
3d5f15f
[fix] use correct variable for saving hf model (#1681)
BaiqingL May 25, 2025
54c9b73
update ascend_quick_start doc (#1685)
zheliuyu May 26, 2025
8298f7d
[Bugfix] Fix for non_fused_kernels passing arguments (#1687)
ETOgaosion May 26, 2025
5fe1839
[CI] fix some tests scope (#1689)
ETOgaosion May 26, 2025
4583e4c
[Doc] Add a visual explanation of the configuration to the documentat…
hiyouga May 26, 2025
9846360
fix TimeoutError in aiohttp (#1702)
casper-hansen May 27, 2025
54b2677
Add dstack example (#2) (#1706)
Bihan May 27, 2025
4d3ca21
[CI] disable e2e_prime, always hang for 50 minutes (#1728)
ETOgaosion May 27, 2025
34e409b
[docs] refactor: Adding doc strings and doc pages for public methods …
hongpeng-guo May 27, 2025
16a13d8
[misc] feat: support logging rollout prob vs. actor probs for debuggi…
vermouth1992 May 28, 2025
d5570c4
[mics][fix] Deprecate legacy `_default_compute_score` API and fix ray…
hongpeng-guo May 28, 2025
9b186ed
Update README.md (#1731)
czx6858 May 28, 2025
99e749a
Fix Configuration for Micro Batch Size in Megatron's Ref Policy (#1700)
none0663 May 28, 2025
432f9e9
[feat][BREAKING] Megatron support dynamic batch size, to rebalance th…
ETOgaosion May 28, 2025
c751404
add linear_cross_entropy
Jianbing-D May 15, 2025
11f70a7
make patch feasible
ETOgaosion May 28, 2025
018ee14
integrate fsdp kernel
ETOgaosion May 28, 2025
3ba8e2c
fix tests
ETOgaosion May 28, 2025
86cce75
fix tests
ETOgaosion May 28, 2025
cd38553
fix shapes
ETOgaosion May 28, 2025
d8170d3
seems no problem with APIs, but precisions not match
ETOgaosion May 28, 2025
01437fa
pass tests
ETOgaosion May 29, 2025
1acd108
fix reward model config
ETOgaosion May 29, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 11 additions & 4 deletions .github/workflows/checkpoint_converter.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,22 @@ on:
- v0.*
paths:
- "**/*.py"
# Entrypoints
- ".github/workflows/checkpoint_converter.yml"
- "!examples"
# Other entrypoints
- "!examples/**"
- "!tests/**"
- "!verl/trainer/main_*.py"
- "!verl/trainer/fsdp_sft_trainer.py"
# Recipes
- "!recipe"
- "!recipe/**"
# FSDP
- "!verl/workers/**/*dp_*.py"
# Entrypoints
- ".github/workflows/checkpoint_converter.yml"
- ".github/workflows/e2e_ppo_trainer_megatron.yml"
- "examples/data_preprocess/gsm8k.py"
- "tests/e2e/run_ppo_trainer_megatron.sh"
- "verl/trainer/main_ppo.py"
- "verl/trainer/config/ppo_megatron_trainer.yaml"


# Cancel jobs on the same ref if a new one is triggered
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,10 @@ on:
# but only for the main branch
push:
branches:
- main
- v0.*
- disabled_ci
pull_request:
branches:
- main
- v0.*
- disabled_ci
paths:
- "**/*.py"
# Other entrypoints
Expand Down
51 changes: 47 additions & 4 deletions .github/workflows/e2e_ascend.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,9 @@ jobs:
test:
name: verl Ascend test (self-host)
runs-on: [self-hosted, npu-0]
timeout-minutes: 5 # Increase this timeout value as needed
timeout-minutes: 30 # Increase this timeout value as needed
container:
image: quay.io/ascend/cann:8.0.0-910b-ubuntu22.04-py3.10
image: quay.io/ascend/cann:8.1.rc1-910b-ubuntu22.04-py3.10
volumes:
- /usr/local/dcmi:/usr/local/dcmi
- /usr/local/bin/npu-smi:/usr/local/bin/npu-smi
Expand All @@ -42,13 +42,56 @@ jobs:
--device /dev/hisi_hdc
--privileged
--network "host"
--shm-size 2g
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
HF_ENDPOINT: "https://hf-mirror.com"
HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
steps:
- name: Check npu and CANN info
run: |
cat /usr/local/Ascend/ascend-toolkit/latest/"$(uname -i)"-linux/ascend_toolkit_install.info
npu-smi info
- name: Checkout volcengine/verl repo
uses: actions/checkout@v4
- name: Run test
- name: Install torch
run: |
lscpu
pip install torch==2.5.1+cpu --index-url https://download.pytorch.org/whl/cpu
pip install torch-npu==2.5.1
pip install /usr/local/Ascend/ascend-toolkit/latest/lib64/te-0.4.0-py3-none-any.whl
- name: Install vllm
run: |
apt-get update && apt-get install -y git
git clone -b v0.7.3 --depth 1 https://github.com/vllm-project/vllm.git vllm-npu
cd vllm-npu
pip install -r requirements-build.txt
VLLM_TARGET_DEVICE=empty pip install -e . --extra-index https://download.pytorch.org/whl/cpu/
- name: Install vllm-ascend
run: |
pip list
pip show torch
git clone -b v0.7.3 --depth 1 https://github.com/vllm-project/vllm-ascend.git
cd vllm-ascend
export COMPILE_CUSTOM_KERNELS=1
python setup.py install
- name: Install the current repository
run: |
pip3 install hf_transfer peft
pip3 install -r requirements-npu.txt
pip install -e .
- name: Prepare gsm8k dataset
run: |
ray stop --force
python3 examples/data_preprocess/gsm8k.py
- name: Running gsm8k e2e training tests with LoRA on ASCEND NPU
run: |
ray stop --force
bash tests/e2e/sft/run_sft.sh
rm -rf $HOME/ckpts
- name: Running gsm8k e2e training tests with GRPO on ASCEND NPU
run: |
ray stop --force
bash tests/npu/run_qwen2_5_05b_grpo.sh
rm -rf $HOME/ckpts
3 changes: 1 addition & 2 deletions .github/workflows/e2e_dapo.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ on:
# Megatron
- "!verl/workers/**/megatron_*.py"
# Home
- "recipe/dapo/src"
- "recipe/dapo"
# Entrypoints
- ".github/workflows/e2e_dapo.yml"
- "examples/data_preprocess/gsm8k.py"
Expand All @@ -34,7 +34,6 @@ concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}


# Declare permissions just read content.
permissions:
contents: read
Expand Down
58 changes: 56 additions & 2 deletions .github/workflows/e2e_ppo_trainer.yml
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ jobs:

e2e_ppo_trainer_vllm:
runs-on: [L20x8]
timeout-minutes: 40 # Increase this timeout value as needed
timeout-minutes: 60 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
Expand Down Expand Up @@ -148,6 +148,14 @@ jobs:
run: |
ray stop --force
LIGER=True bash tests/e2e/ppo_trainer/run_model_reward.sh
- name: Running GSM8K E2E with rmpad using model rm with Fused Kernel enabled
run: |
ray stop --force
FUSED_KERNELS=True bash tests/e2e/ppo_trainer/run_model_reward.sh
- name: Running GSM8K E2E with rmpad using model rm with Fused Kernel enabled
run: |
ray stop --force
FUSED_KERNEL=True FUSED_KERNEL_BACKEND=triton bash tests/e2e/ppo_trainer/run_model_reward.sh

e2e_ppo_trainer_vllm_vlm:
runs-on: [L20x8]
Expand Down Expand Up @@ -182,6 +190,27 @@ jobs:
MODEL_ID=Qwen/Qwen2-VL-2B-Instruct \
ADV_ESTIMATOR=grpo RM_PAD=True USE_KL=True ENABLE_CHUNKED_PREFILL=False \
bash tests/e2e/ppo_trainer/run_function_reward.sh
- name: Running Geo3k VLM E2E with rmpad using fused kernel (Qwen2.5-VL)
run: |
ray stop --force
FUSED_KERNELS=True TRAIN_FILES=$HOME/data/geo3k/train.parquet VAL_FILES=$HOME/data/geo3k/test.parquet \
MAX_PROMPT_LEN=1536 MAX_RESPONSE_LEN=1536 \
MODEL_ID=Qwen/Qwen2.5-VL-3B-Instruct \
ADV_ESTIMATOR=grpo RM_PAD=True USE_KL=True ENABLE_CHUNKED_PREFILL=False \
GPU_MEMORY_UTILIZATION=0.6 ACTOR_FSDP_PARAM_OFFLOAD=True \
ACTOR_FSDP_OPTIMIZER_OFFLOAD=True REF_FSDP_PARAM_OFFLOAD=True \
bash tests/e2e/ppo_trainer/run_function_reward.sh
- name: Running Geo3k VLM E2E with rmpad using fused kernel (Qwen2.5-VL)
run: |
ray stop --force
FUSED_KERNELS=True FUSED_KERNEL_BACKEND=triton \
TRAIN_FILES=$HOME/data/geo3k/train.parquet VAL_FILES=$HOME/data/geo3k/test.parquet \
MAX_PROMPT_LEN=1536 MAX_RESPONSE_LEN=1536 \
MODEL_ID=Qwen/Qwen2.5-VL-3B-Instruct \
ADV_ESTIMATOR=grpo RM_PAD=True USE_KL=True ENABLE_CHUNKED_PREFILL=False \
GPU_MEMORY_UTILIZATION=0.6 ACTOR_FSDP_PARAM_OFFLOAD=True \
ACTOR_FSDP_OPTIMIZER_OFFLOAD=True REF_FSDP_PARAM_OFFLOAD=True \
bash tests/e2e/ppo_trainer/run_function_reward.sh

e2e_ppo_trainer_sglang:
runs-on: [L20x8]
Expand Down Expand Up @@ -269,11 +298,15 @@ jobs:
run: |
ray stop --force
bash tests/e2e/run_gsm8k_fsdp_sgl_multiturn_w_tool.sh
- name: Running GSM8K with tool E2E training tests with FSDP2
run: |
ray stop --force
FSDP_STRATEGY=fsdp2 bash tests/e2e/run_gsm8k_fsdp_sgl_multiturn_w_tool.sh

e2e_ppo_trainer_sglang_vlm:
runs-on: [L20x8]
needs: pre_commit_for_ppo
timeout-minutes: 40 # Increase this timeout value as needed
timeout-minutes: 60 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
Expand Down Expand Up @@ -305,3 +338,24 @@ jobs:
ENGINE=sglang GPU_MEMORY_UTILIZATION=0.6 ACTOR_FSDP_PARAM_OFFLOAD=True \
ACTOR_FSDP_OPTIMIZER_OFFLOAD=True REF_FSDP_PARAM_OFFLOAD=True \
bash tests/e2e/ppo_trainer/run_function_reward.sh
- name: Running Geo3k VLM E2E with rmpad using fused kernel (Qwen2.5-VL)
run: |
ray stop --force
FUSED_KERNELS=True TRAIN_FILES=$HOME/data/geo3k/train.parquet VAL_FILES=$HOME/data/geo3k/test.parquet \
MAX_PROMPT_LEN=1536 MAX_RESPONSE_LEN=1536 \
MODEL_ID=Qwen/Qwen2.5-VL-3B-Instruct \
ADV_ESTIMATOR=grpo RM_PAD=True USE_KL=True ENABLE_CHUNKED_PREFILL=False \
ENGINE=sglang GPU_MEMORY_UTILIZATION=0.6 ACTOR_FSDP_PARAM_OFFLOAD=True \
ACTOR_FSDP_OPTIMIZER_OFFLOAD=True REF_FSDP_PARAM_OFFLOAD=True \
bash tests/e2e/ppo_trainer/run_function_reward.sh
- name: Running Geo3k VLM E2E with rmpad using fused kernel (Qwen2.5-VL)
run: |
ray stop --force
FUSED_KERNELS=True FUSED_KERNEL_BACKEND=triton \
TRAIN_FILES=$HOME/data/geo3k/train.parquet VAL_FILES=$HOME/data/geo3k/test.parquet \
MAX_PROMPT_LEN=1536 MAX_RESPONSE_LEN=1536 \
MODEL_ID=Qwen/Qwen2.5-VL-3B-Instruct \
ADV_ESTIMATOR=grpo RM_PAD=True USE_KL=True ENABLE_CHUNKED_PREFILL=False \
ENGINE=sglang GPU_MEMORY_UTILIZATION=0.6 ACTOR_FSDP_PARAM_OFFLOAD=True \
ACTOR_FSDP_OPTIMIZER_OFFLOAD=True REF_FSDP_PARAM_OFFLOAD=True \
bash tests/e2e/ppo_trainer/run_function_reward.sh
Loading
Loading