Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
114 commits
Select commit Hold shift + click to select a range
3979d2f
[recipe] feat: add Experimental VLA RL Support (#3918)
The-Hierophant Nov 25, 2025
09a923a
[recipe, data] feat: TransferQueue - Support managing multiple data p…
LLLLxmmm Nov 25, 2025
902cfc4
[ci] feat: Increase e2e_sft timeout from 25 to 30 minutes (#4279)
vermouth1992 Nov 25, 2025
313dfdb
[megatron] feat: Integrate Megatron-Bridge and support LoRA/PEFT (#4063)
HollowMan6 Nov 25, 2025
9886706
[single_controller] feat: support resource_pool split (#4273)
yyDing1 Nov 25, 2025
eb6ef40
[recipe] feat: move recipes to new repository verl-recipe (#4283)
wuxibin89 Nov 25, 2025
146075b
[worker] feat: restore colocate workers based on new splited resource…
yyDing1 Nov 25, 2025
bae6f35
[misc] feat: Add `actor_rollout_ref.actor.calculate_entropy` for entr…
EduardDurech Nov 25, 2025
2a36450
[trainer] feat: Self-Normalized Importance Sampling (#3980)
EduardDurech Nov 25, 2025
3399d39
[ci, megatron] fix: add `rotary_pos_cos_sin` to forward (#4291)
HollowMan6 Nov 26, 2025
d0997d2
[megatron] fix: pass trust_remote_code to get_generation_config (#4196)
jprellberg Nov 26, 2025
c651b7b
[misc] fix: support nested datastructure in dataproto to convert to t…
PeterSH6 Nov 26, 2025
e5243f5
[ci] fix: use local hf model path (#4299)
wuxibin89 Nov 26, 2025
a8a3290
[data] feat: TransferQueue - Support AgentLoop performance metrics & …
0oshowero0 Nov 26, 2025
7dd1245
[recipe] feat: support reward_loop for recipe/fully_async_policy (#4224)
sl-1314 Nov 26, 2025
77ef1db
[misc] fix: fix list conversion in get_tensordict (#4304)
PeterSH6 Nov 26, 2025
9cc9feb
[hardware] fix: Workaround for torch-npu's lack of support for creati…
ji-huazhong Nov 27, 2025
e36433c
[rollout] fix: some compatibility changes in agent loop and reward (#…
pengwu22 Nov 27, 2025
799c931
[worker] fix: do not pass router address and tokenizer is their value…
yyDing1 Nov 27, 2025
f623c14
[doc] chore: Update ascend quickstart doc (#4321)
FightingZhen Nov 27, 2025
3122631
[misc] feat: add more utils of tensordict (#4322)
vermouth1992 Nov 27, 2025
395d7f6
[recipe] fix: Fixed scripts for one_step_off_policy async not impleme…
baymax591 Nov 29, 2025
5a5fe9d
[model] feat: refactor engine folder structure (#4352)
vermouth1992 Nov 29, 2025
648ef53
[recipe] feat: move char count recipe to verl-recipe (#4351)
vermouth1992 Dec 1, 2025
0ce17df
[ci] chore: switch ascend ci calculation resource (#4347)
FightingZhen Dec 1, 2025
6e6ae96
feat(actor): add loss_scale_factor for seq-mean-token-sum-norm mode (…
szrlee Dec 1, 2025
62a9965
[misc] refactor: clean up unused sharding managers (#4361)
ji-huazhong Dec 1, 2025
01eeb49
[worker] feat: Add TrainingWorker that resembles Tinker-like API (#4371)
vermouth1992 Dec 2, 2025
37a01f0
[vllm] fix: Fix issues that occur during the ACLGraph initialization …
chengminhua Dec 2, 2025
9f3199f
[megatron] feat: support gpt-oss (#4323)
ISEEKYAN Dec 2, 2025
9d77200
[megatron] fix: megatron async save ckpt fix (#4253)
Leem-Li Dec 2, 2025
d12d6b3
[misc] feat: Update news section in README.md (#4385)
vermouth1992 Dec 2, 2025
a863f25
[misc] fix: handle empty TensorDict in DataProto serialization (#4379)
le-czs Dec 2, 2025
fb860f0
[trainer,fsdp] feat: enable reproducibility for training (#4378)
ji-huazhong Dec 2, 2025
3d77af6
[trainer] feat: support ray-based sft trainer (#4382)
vermouth1992 Dec 2, 2025
d9ef1e5
[megatron] feat: optimize the mbridge checkpoint saving speed (#4386)
ISEEKYAN Dec 2, 2025
5d00c08
[rollout] feat: add support for discriminative reward model in reward…
yyDing1 Dec 2, 2025
15a9b0f
[recipe] feat: refactor one step off to support server mode (#4307)
ArronHZG Dec 3, 2025
4386937
[misc] feat: support TensorDict in DataProtoFuture (#4395)
vermouth1992 Dec 3, 2025
80c860f
[fsdp] fix: Fixing the error caused by empty tensors in the multi_tur…
nuerxiati Dec 3, 2025
2f2996b
[doc] fix: add Geo-RS-Seq-TIS estimators and update documentation (#4…
szrlee Dec 3, 2025
27d1ada
[worker] feat: custom master addr port (#4389)
tongyx361 Dec 3, 2025
493a397
[doc] feat: update reward loop document (#4404)
yyDing1 Dec 3, 2025
cb23607
[algo] feat: support router replay (#4101)
litianjian Dec 4, 2025
f11af2d
[recipe] fix: FlowRL actor to pure implementation (#4397)
Xuekai-Zhu Dec 4, 2025
9dfed53
[doc] feat: add more user instructions to reward loop doc (#4409)
yyDing1 Dec 4, 2025
a47fa6d
[doc] feat: add OneThinker link in readme (#4410)
appletea233 Dec 4, 2025
57f303c
[ci] fix: NPU not support router replay (#4414)
wuxibin89 Dec 4, 2025
ab07052
[worker] feat: custom reward_manager (#4387)
tongyx361 Dec 4, 2025
fd893c7
[vllm] feat: retires vllm spmd mode in the codebase (#4411)
PeterSH6 Dec 4, 2025
d1d1f03
[sglang] fix: HTTP server startup issues for Prometheus and Grafana i…
jsfanfanfan Dec 5, 2025
51f3190
[doc] chore: Update ascend quickstart and docker build guidance doc (…
FightingZhen Dec 5, 2025
16039d6
[sglang] feat: retires sglang spmd mode in the codebase (#4422)
PeterSH6 Dec 5, 2025
7d44f22
[fsdp] feat: update NPU fused kernels for Qwen3 moe block (#4406)
icerain-alt Dec 5, 2025
5eedbae
[misc] refactor: clean up unused sharding manager (#4439)
ji-huazhong Dec 6, 2025
0b37696
[hardware] chore: clean npu_patch (#4436)
FightingZhen Dec 6, 2025
0d1c100
[misc] fix: fix memory leakage when initializing multiple tools (#4430)
PeterSH6 Dec 6, 2025
d8e97e1
[trainer, vllm, megatron, recipe] feat: one/two step off async on-pol…
moehanabi Dec 6, 2025
a3c417c
[misc] feat: optimize performance of index_select_tensor_dict (#4444)
vermouth1992 Dec 8, 2025
12b2851
[ci] test: Disable ReMax training test in vllm workflow (#4445)
PeterSH6 Dec 8, 2025
5808f4d
[rollout] fix: RolloutConfig should support repetition_penalty config…
Lokiscripter Dec 8, 2025
0fac641
[recipe] feat: add fully async comm between rollout and sim node in d…
HanlinDu Dec 8, 2025
615aa67
[misc] feat: optimize nested tensor index (#4447)
vermouth1992 Dec 8, 2025
9b50fb7
[model] feat: add qwen3-4b grpo script on ASCEND NPU A3 (#4432)
5082459 Dec 8, 2025
7ca9d8a
[megatron] fix: Remove Deprecated Megatron Optimizer Args (#4396)
DaizeDong Dec 8, 2025
6d4fd9a
[megatron] fix: respect `use_distributed_optimizer` in config (#4392)
HollowMan6 Dec 8, 2025
80af9db
[recipe, ci] fix: remove batch mode for remote generative reward mode…
yyDing1 Dec 8, 2025
b96b53a
[misc] feat: optimize rearrange_micro_batches (#4451)
vermouth1992 Dec 8, 2025
95a94e3
[rollout, sglang] feat: support blockwise fp8 rollout (#4415)
Agoniii Dec 8, 2025
932a462
fix conflict upstream
bachvudinh Dec 8, 2025
f332fc8
[trainer] feat: model engine sft trainer support vlm model (#4403)
wuxibin89 Dec 8, 2025
1365848
Merge branch 'main' of github.com:janhq/verl
nguyenhoangthuan99 Dec 8, 2025
0102d04
[trainer] feat: add reward loop config to default config (#4452)
yyDing1 Dec 9, 2025
e0c46e9
[vllm] feat: support abort generating requests in vllm server (#4453)
PeterSH6 Dec 9, 2025
2319bab
[ci] chore: cleanup some ci workflow (#4459)
wuxibin89 Dec 9, 2025
7417d88
[trainer] feat: allow override for reward_manager_worker in agent loo…
ryxli Dec 9, 2025
5617529
[model] feat: enhances TrainingWorker (#4461)
vermouth1992 Dec 9, 2025
aee5aa8
[recipe] feat: Modify the way of obtaining default_runtime_env (#4468)
xichengpro Dec 9, 2025
896db9b
[rollout] fix: mlflow consecutive slashes (#4446)
BaiqingL Dec 10, 2025
d66120d
[fsdp] fix: reward model also reads override config attn_implementati…
pengwu22 Dec 10, 2025
bb75788
[vllm] fix: compatible to vllm0.12 (#4473)
ISEEKYAN Dec 10, 2025
7ffd4fe
[model] feat: support manual control load/offload (#4472)
vermouth1992 Dec 10, 2025
cfcdddf
[ci] feat: Update e2e_ascend to improve CI execution efficiency (#4477)
FightingZhen Dec 10, 2025
5a2e0b1
[ci] fix: Fix e2e_ascend sft test case error (#4481)
FightingZhen Dec 10, 2025
d8195a8
[trainer] feat: support moving ppo actor logics to single controller …
vermouth1992 Dec 11, 2025
01ab536
[megatron] fix: correct typo in modeling_qwen2_megatron.py (#4486)
study8677 Dec 11, 2025
3824689
[fsdp] fix: qwen3vlmoe with Monkey patch to fix a bug in transformers…
pengyanai Dec 12, 2025
7deb67c
[ci] fix: fix format check error (#4506)
ji-huazhong Dec 13, 2025
392791b
[hardware] feat: Auto set device_name to npu for Ascend NPU (#4489)
FightingZhen Dec 15, 2025
7eb030e
[trainer] feat: make reward loop disrm default (#4466)
yyDing1 Dec 15, 2025
518bada
[algo,doc] refactor: rollout correction (#4511)
szrlee Dec 15, 2025
290b522
[trainer] feat: enable model engine based critic (#4507)
vermouth1992 Dec 15, 2025
da214ab
[vllm, rollout] feat: support reset prefix cache after abort (#4519)
PeterSH6 Dec 15, 2025
ec14a87
[ci] chore: remove proxy settings in e2e_ascend (#4527)
FightingZhen Dec 15, 2025
36c2d4a
[rollout] fix: correct heap-based load balancing in AsyncLLMServerMan…
hellcatCS Dec 15, 2025
5922445
[sglang, rollout] feat: delete remaining sglang spmd code (#4523)
PeterSH6 Dec 15, 2025
baf3a63
[data] feat: TransferQueue - Add zero-copy serialization support & us…
0oshowero0 Dec 15, 2025
ebec85d
[rollout] feat: pass agent_data to tool calling (#4469)
wuxibin89 Dec 16, 2025
2fd6591
[megatron,ci] chore: update instructions and scripts for LoRA (#4533)
HollowMan6 Dec 16, 2025
d7c82bd
[megatron] chore: clean legacy code path part 1, make engine use mbri…
ISEEKYAN Dec 16, 2025
a0e8e44
[megatron] chore: clean legacy code path part 2, clean legacy CI (#4529)
ISEEKYAN Dec 16, 2025
fdf0046
[trainer] fix: model engine vlm multi_modal_inputs to NonTensorStack …
wuxibin89 Dec 16, 2025
7cb647d
[ray] chore: Update Ray version dependency in requirements-npu.txt (#…
FightingZhen Dec 16, 2025
a07556d
[ci] chore: migrate all rm related ci to reward loop (#4520)
yyDing1 Dec 17, 2025
6a58521
[algo] fix: Add seq mean mask denominator option (#4510)
szrlee Dec 17, 2025
379f296
[trainer] fix: change name for reward loop worker override (#4549)
ryxli Dec 17, 2025
b5d6a30
[rollout,vllm] feat: disable sleep mode in fully-async mode (#4521)
chenjiaoAngel Dec 17, 2025
022c0ae
[rollout, trainer] feat: extend agent loop for custom implementations…
JoyboyBrian Dec 17, 2025
1ae510c
[rollout] chore: update reward loop file names (#4547)
yyDing1 Dec 17, 2025
4bf590e
[ci] fix: Add mbridge dependency into e2e_ascend (#4560)
FightingZhen Dec 17, 2025
3146164
[doc] feat: add JupyterLab plugin instructions (#4536)
yqsstudy Dec 17, 2025
646fb4a
[ci] feat: Increase e2e_sft timeout from 30 to 40 minutes (#4552)
vermouth1992 Dec 17, 2025
16a6c47
[misc] chore: add "reward" tag to PR template (#4573)
yyDing1 Dec 17, 2025
bdbb085
sync upstream
bachvudinh Dec 18, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data`
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data`, `cfg`, `reward`
- If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title.
Expand Down
175 changes: 0 additions & 175 deletions .github/workflows/checkpoint_converter.yml

This file was deleted.

137 changes: 109 additions & 28 deletions .github/workflows/e2e_ascend.yml
Original file line number Diff line number Diff line change
Expand Up @@ -65,22 +65,24 @@ permissions:
contents: read

jobs:
test:
non_rl_job:
if: github.repository_owner == 'volcengine'
name: verl Ascend test (self-host)
runs-on: linux-aarch64-a2-8
timeout-minutes: 60 # Increase this timeout value as needed
name: E2E Ascend testing for non-RL algorithm scenarios
runs-on: linux-aarch64-a2-2
timeout-minutes: 60
container:
image: swr.ap-southeast-1.myhuaweicloud.com/base_image/ascend-ci/verl/verl:verl-8.3.rc1-910b-ubuntu22.04-py3.11-latest
options: >-
--shm-size 16g
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
HF_ENDPOINT: "https://hf-mirror.com"
HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
steps:
- name: Config third-party dependency download cache
run: |
sed -Ei 's@(ports|archive).ubuntu.com@cache-service.nginx-pypi-cache.svc.cluster.local:8081@g' /etc/apt/sources.list
pip config set global.index-url http://cache-service.nginx-pypi-cache.svc.cluster.local/pypi/simple
pip config set global.trusted-host cache-service.nginx-pypi-cache.svc.cluster.local
- name: Check npu and CANN info
run: |
cat /usr/local/Ascend/ascend-toolkit/latest/"$(uname -i)"-linux/ascend_toolkit_install.info
Expand All @@ -103,47 +105,126 @@ jobs:
- name: Preprocess gsm8k dataset
run: |
python examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/.cache/datasets/openai/gsm8k
- name: Preprocess geo3k dataset
run: |
python examples/data_preprocess/geo3k.py --local_dataset_path ${HOME}/.cache/datasets/hiyouga/geometry3k
- name: Running gsm8k e2e qwen3 training tests with PPO on ASCEND NPU
run: |
ray stop --force
bash tests/special_npu/run_qwen3_06b_ppo.sh
rm -rf $HOME/ckpts
- name: Running gsm8k e2e training tests with peft sft on ASCEND NPU
run: |
ray stop --force
bash tests/special_npu/run_qwen2_5_05b_sft_peft_sp2.sh
rm -rf $HOME/ckpts
- name: Running gsm8k e2e training tests with GRPO on ASCEND NPU
- name: Running NPU profiling unit tests
run: |
ray stop --force
bash tests/special_npu/run_qwen2_5_05b_grpo.sh
rm -rf $HOME/ckpts
- name: Running geo3k e2e training tests with GRPO on ASCEND NPU
pytest -s -x tests/utils/test_special_mstx_profile.py

llm_rl_job:
if: github.repository_owner == 'volcengine'
name: E2E Ascend testing for RL training scenarios of LLM models
runs-on: linux-aarch64-a2-8
timeout-minutes: 60
container:
image: swr.ap-southeast-1.myhuaweicloud.com/base_image/ascend-ci/verl/verl:verl-8.3.rc1-910b-ubuntu22.04-py3.11-latest
options: >-
--shm-size 16g
env:
HF_ENDPOINT: "https://hf-mirror.com"
HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
steps:
- name: Config third-party dependency download cache
run: |
sed -Ei 's@(ports|archive).ubuntu.com@cache-service.nginx-pypi-cache.svc.cluster.local:8081@g' /etc/apt/sources.list
pip config set global.index-url http://cache-service.nginx-pypi-cache.svc.cluster.local/pypi/simple
pip config set global.trusted-host cache-service.nginx-pypi-cache.svc.cluster.local
- name: Check npu and CANN info
run: |
cat /usr/local/Ascend/ascend-toolkit/latest/"$(uname -i)"-linux/ascend_toolkit_install.info
npu-smi info
- name: Check initial pip list from image
run: |
pip list
- name: Checkout volcengine/verl repo
uses: actions/checkout@v4
with:
fetch-depth: 0
clean: true
- name: Install the current repository
run: |
pip install -r requirements-npu.txt
pip install -e .
- name: Check final pip list
run: |
pip list
- name: Preprocess gsm8k dataset
run: |
python examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/.cache/datasets/openai/gsm8k
- name: Running gsm8k e2e training tests with PPO on ASCEND NPU (FSDP backend)
run: |
ray stop --force
bash tests/special_npu/run_qwen2_5_vl_3b_npu.sh
bash tests/special_npu/run_qwen3_06b_ppo.sh
rm -rf $HOME/ckpts
- name: Running gsm8k e2e training tests with DAPO on ASCEND NPU
- name: Running gsm8k e2e training tests with GRPO on ASCEND NPU (FSDP backend)
run: |
ray stop --force
bash tests/special_npu/run_qwen2_5_05b_dapo.sh
bash tests/special_npu/run_qwen2_5_05b_grpo.sh
rm -rf $HOME/ckpts
- name: Running gsm8k e2e qwen3 MoE training tests with DAPO MindSpeed on ASCEND NPU
- name: Running gsm8k e2e training tests with DAPO on ASCEND NPU (FSDP backend)
run: |
ray stop --force
export PYTHONPATH=$PYTHONPATH:/Megatron-LM
USE_DIST_CKPT=True USE_DUMMY_MODEL=True DUMMY_MODEL_CONFIG_PATH=tests/special_e2e/ppo_trainer/expert_parallel/qwen3moe_minimal.json DUMMY_MODEL_PATH=$HOME/dist_ckpt/qwen3_30b_dapo_mindspeed bash tests/special_npu/run_qwen3_30b_dapo_mindspeed.sh
- name: Running gsm8k e2e training tests with GRPO MindSpeed on ASCEND NPU
bash tests/special_npu/run_qwen2_5_05b_dapo.sh
rm -rf $HOME/ckpts
- name: Running gsm8k e2e training tests with GRPO on ASCEND NPU (MindSpeed backend)
run: |
ray stop --force
export PYTHONPATH=$PYTHONPATH:/Megatron-LM
USE_DIST_CKPT=True bash tests/special_npu/run_qwen2_5_05b_grpo_mindspeed.sh
rm -rf $HOME/dist_ckpt/qwen2_5_05b_grpo_mindspeed
rm -rf $HOME/ckpts
- name: Running NPU profiling unit tests
- name: Running gsm8k e2e training tests with DAPO on ASCEND NPU (MindSpeed backend, MoE Model)
run: |
ray stop --force
pytest -s -x tests/utils/test_special_mstx_profile.py
export PYTHONPATH=$PYTHONPATH:/Megatron-LM
USE_DIST_CKPT=True USE_DUMMY_MODEL=True DUMMY_MODEL_CONFIG_PATH=tests/special_e2e/ppo_trainer/expert_parallel/qwen3moe_minimal.json DUMMY_MODEL_PATH=$HOME/dist_ckpt/qwen3_30b_dapo_mindspeed bash tests/special_npu/run_qwen3_30b_dapo_mindspeed.sh

vlm_rl_job:
if: github.repository_owner == 'volcengine'
name: E2E Ascend testing for RL training scenarios of VLM models
runs-on: linux-aarch64-a2-8
timeout-minutes: 60
container:
image: swr.ap-southeast-1.myhuaweicloud.com/base_image/ascend-ci/verl/verl:verl-8.3.rc1-910b-ubuntu22.04-py3.11-latest
options: >-
--shm-size 16g
env:
HF_ENDPOINT: "https://hf-mirror.com"
HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
steps:
- name: Config third-party dependency download cache
run: |
sed -Ei 's@(ports|archive).ubuntu.com@cache-service.nginx-pypi-cache.svc.cluster.local:8081@g' /etc/apt/sources.list
pip config set global.index-url http://cache-service.nginx-pypi-cache.svc.cluster.local/pypi/simple
pip config set global.trusted-host cache-service.nginx-pypi-cache.svc.cluster.local
- name: Check npu and CANN info
run: |
cat /usr/local/Ascend/ascend-toolkit/latest/"$(uname -i)"-linux/ascend_toolkit_install.info
npu-smi info
- name: Check initial pip list from image
run: |
pip list
- name: Checkout volcengine/verl repo
uses: actions/checkout@v4
with:
fetch-depth: 0
clean: true
- name: Install the current repository
run: |
pip install -r requirements-npu.txt
pip install -e .
- name: Check final pip list
run: |
pip list
- name: Preprocess geo3k dataset
run: |
python examples/data_preprocess/geo3k.py --local_dataset_path ${HOME}/.cache/datasets/hiyouga/geometry3k
- name: Running geo3k e2e training tests with GRPO on ASCEND NPU
run: |
ray stop --force
bash tests/special_npu/run_qwen2_5_vl_3b_npu.sh
rm -rf $HOME/ckpts
Loading
Loading