Skip to content
Merged
Show file tree
Hide file tree
Changes from 65 commits
Commits
Show all changes
120 commits
Select commit Hold shift + click to select a range
d7a70a5
Add new designed data model for tool and async rollout
SwordFaith Apr 2, 2025
dc578c1
add AsyncRolloutRequest and message-ids convert tools
zyzshishui Apr 4, 2025
0edc910
Add unittest to async sgl engine and related ci
SwordFaith Apr 4, 2025
59df7ab
Fix test case and add verl engine with async
SwordFaith Apr 7, 2025
63caefe
Fix AsyncVerlEngine naming
SwordFaith Apr 7, 2025
2eea830
Dump test async sgl rollout checkpoint
SwordFaith Apr 7, 2025
4199cf8
Fix async rollout with memory saver issue
SwordFaith Apr 8, 2025
ed46763
Tmp dump for pass messages by raw_prompt key in non_tensor_batch
SwordFaith Apr 8, 2025
bc4fe76
Add example gsm8k tool and fix schema issue
SwordFaith Apr 8, 2025
d2522f0
Tmp dump for add prompt to async rollout request and async rollout st…
SwordFaith Apr 9, 2025
e0b3bbb
Wait to add more test for async gen with tools and verify training se…
SwordFaith Apr 10, 2025
4532d33
Add support for tool_calls formating and parse error
SwordFaith Apr 10, 2025
c415ad4
support loss_mask and loading tool from config
zyzshishui Apr 12, 2025
6a91862
fix review
zyzshishui Apr 12, 2025
c3831f2
fix review
zyzshishui Apr 12, 2025
7c063a9
Fix get_tool_call_parser_type bot/eot check logic
SwordFaith Apr 13, 2025
3cbf99b
Convert tool method and fn call to all async
SwordFaith Apr 13, 2025
c730d48
merge and add rst file for multiturn
zyzshishui Apr 13, 2025
983c28e
fixed tool initialization
zyzshishui Apr 13, 2025
2cb290c
Add unittest with sharding manager and rollout, and fix async generat…
SwordFaith Apr 13, 2025
8d18e43
wait to support batch rollout
zyzshishui Apr 13, 2025
a88ddda
add max_turn
zyzshishui Apr 14, 2025
bdfd998
add e2e test for tool calling
zyzshishui Apr 14, 2025
26ade88
fix batch rollout
zyzshishui Apr 14, 2025
c437093
Merge branch 'feat/add_async_sglang_multi_turn_support' into feat/los…
SwordFaith Apr 14, 2025
8e7b00b
Merge pull request #2 from zyzshishui/feat/loss_mask_and_tool_config
SwordFaith Apr 14, 2025
fd3b44c
Dump first runable version
SwordFaith Apr 14, 2025
1ff236c
Fix config missing in dump
SwordFaith Apr 14, 2025
77f0429
Try use mb cluster training
SwordFaith Apr 15, 2025
c0d710d
Fix script pip install bug
SwordFaith Apr 15, 2025
0e627d7
Disable debug data batch keys log
SwordFaith Apr 15, 2025
1a803ea
Refined to correctly run grpo in a stable way
SwordFaith Apr 15, 2025
b215f02
Fix ckpt path and wandb dir
SwordFaith Apr 15, 2025
9c11a1e
Add update torch-memory-saver and wandb
SwordFaith Apr 16, 2025
6062a8c
Add new scripts for verify
SwordFaith Apr 16, 2025
bc6043b
Fix config bugs
SwordFaith Apr 16, 2025
9abf201
Try fix import sgl when using vllm only error
SwordFaith Apr 16, 2025
28d6268
Try increase n=16
SwordFaith Apr 16, 2025
a556981
Add n=64 setting to increasing rate of convergence
SwordFaith Apr 16, 2025
6a479cc
Add temperature 1.0 support
SwordFaith Apr 16, 2025
a29fac3
Update config
SwordFaith Apr 23, 2025
df51df2
Dump debug code
SwordFaith Apr 23, 2025
9b50d00
Add verification for single turn generation
SwordFaith Apr 23, 2025
f20aed7
Fix bug
SwordFaith Apr 23, 2025
67b15e6
Add 3k max resp script
SwordFaith Apr 24, 2025
15ff9c5
Use patch with aligned cli args
SwordFaith Apr 24, 2025
5a372b0
Fix arg
SwordFaith Apr 24, 2025
439d816
Add new version data script
SwordFaith Apr 24, 2025
8ce6689
Try if it's padding issue
SwordFaith Apr 25, 2025
f60ac69
Add pad test script
SwordFaith Apr 25, 2025
679c324
Fix dtype in pad_sequence_to_length issue
SwordFaith Apr 25, 2025
07e25e6
Add new training script
SwordFaith Apr 25, 2025
ddb2337
Add pad print
SwordFaith Apr 25, 2025
25e7814
Add debug log
SwordFaith Apr 25, 2025
6f6f94e
Try refine debug logs
SwordFaith Apr 25, 2025
185cf74
Add n16 train script
SwordFaith Apr 25, 2025
c097cb1
Add loss mask test script
SwordFaith Apr 25, 2025
e839465
Add debug info to pad and unpad
SwordFaith Apr 26, 2025
560e2ca
Add schema update and format, need_tool_kwargs fix
SwordFaith Apr 26, 2025
b939760
try to change to relative path
WANG-GH Apr 26, 2025
0719d1f
fix abs path
WANG-GH Apr 26, 2025
9f17ace
Remove verl engine with async and add modelbest inc as co-org
SwordFaith Apr 26, 2025
6ecb8d4
fix conlficts in RLHF dataset
zhaochenyang20 Apr 27, 2025
80e3c2c
Merge branch 'main' of github.com:SwordFaith/verl
zhaochenyang20 Apr 27, 2025
0729b07
Merge branch 'main' into feat/renewed_multi_turn_pr_branch
zhaochenyang20 Apr 27, 2025
caf1a08
resolve not our code
WANG-GH Apr 27, 2025
a7ffe12
fix format v2
WANG-GH Apr 27, 2025
20f5696
Fix example run issue
SwordFaith Apr 27, 2025
6cfce61
Add environ SGL_DISABLE_TP_MEMORY_INBALANCE_CHECK=True
SwordFaith Apr 27, 2025
6e7017d
fix lint v1
WANG-GH Apr 27, 2025
4ea91e3
fix lint v2
WANG-GH Apr 27, 2025
80be127
Fix dapo ci
SwordFaith Apr 27, 2025
67f218f
Fix sanity check ci
SwordFaith Apr 27, 2025
092b7b9
Add back reinforce plus plus base line
SwordFaith Apr 27, 2025
2066bae
Fix file naming and dir structure issue
SwordFaith Apr 27, 2025
f2e8a6b
Fix license str
SwordFaith Apr 27, 2025
9efb2be
Fix req_list append missing
SwordFaith Apr 27, 2025
a195328
Sync generate sequences with sglang_rollout
SwordFaith Apr 27, 2025
897edf4
stop before enter fsdp_async_manager
WANG-GH Apr 28, 2025
10dff84
Add config validation assertion
SwordFaith Apr 28, 2025
e19fcaa
Refactor async sglang sharding manager
SwordFaith Apr 28, 2025
377bf06
repeat tool_kwargs & upgrade ppo_megatron_trainer
ocss884 Apr 28, 2025
b6fa35b
Fix e2e test too long
SwordFaith Apr 28, 2025
9805be8
refact test v1
WANG-GH Apr 28, 2025
3aebb8c
unfinish
WANG-GH Apr 28, 2025
712c2ef
refact test v3
WANG-GH Apr 28, 2025
a7236b8
refact test v4
WANG-GH Apr 28, 2025
5569eaf
Fix import path error and update ci config
SwordFaith Apr 28, 2025
48ad489
Merge branch 'main' into feat/add_async_sglang_multi_turn_support
tongyx361 Apr 28, 2025
9e66aec
fix: pre-commit
tongyx361 Apr 28, 2025
b078914
fix: license
tongyx361 Apr 28, 2025
508d382
fix: tools_kwargs
tongyx361 Apr 28, 2025
eb4f1cb
fix: .mypy_cache
tongyx361 Apr 28, 2025
5886337
Merge pull request #10 from tongyx361/tyx/fix/ci
tongyx361 Apr 28, 2025
f79d706
update async test
ocss884 Apr 28, 2025
a8eb256
pre-commit
ocss884 Apr 28, 2025
7020582
fix
ocss884 Apr 28, 2025
3468b4c
Fix lint and add mutiturn and sglang_async to e2e ppo trainer
SwordFaith Apr 28, 2025
70b7169
Fix broadcast_pyobj args
SwordFaith Apr 28, 2025
3501c46
fix sgl.yml
ocss884 Apr 28, 2025
828c05b
feat: sgl CI manual trigger
tongyx361 Apr 28, 2025
a8d6142
Fix nproc issue in sgl.yml
SwordFaith Apr 28, 2025
6b8552f
fix sgl.yml
ocss884 Apr 28, 2025
029d397
more
ocss884 Apr 28, 2025
db781c0
fix wandb
eric-haibin-lin Apr 28, 2025
2153056
Fix cpu backend missing bug
SwordFaith Apr 29, 2025
83a735d
Add sharding manager to unit test
SwordFaith Apr 29, 2025
270b7b0
Seperate e2e ppo trainer sglang task
SwordFaith Apr 29, 2025
040237e
Avoid validation and lower test time
SwordFaith Apr 29, 2025
3e21b08
feat: secrets.HF_ENDPOINT
tongyx361 Apr 29, 2025
1105c3e
fix: secrets.HF_ENDPOINT
tongyx361 Apr 29, 2025
766a082
feat: https://hf-mirror.com
tongyx361 Apr 29, 2025
9912290
temp: only test ppo_trainer for sglang
tongyx361 Apr 29, 2025
6ee96e1
Revert "temp: only test ppo_trainer for sglang"
tongyx361 Apr 29, 2025
ef7d527
temp: SGLang label
tongyx361 Apr 29, 2025
52a7ee6
Revert "temp: SGLang label"
tongyx361 Apr 29, 2025
1d4d648
temp: SGLang label
tongyx361 Apr 29, 2025
05413e3
feat: no_proxy for hf-mirror.com
tongyx361 Apr 29, 2025
6f0be26
fix: ppo_micro_batch_size_per_gpu=16
tongyx361 Apr 29, 2025
095ab9b
revert: SGLang label
tongyx361 Apr 29, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 58 additions & 0 deletions .github/workflows/sgl.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
name: sgl

on:
# Trigger the workflow on push or pull request,
# but only for the main branch
push:
branches:
- main
- v0.2.x
paths:
- "**/*.py"
- .github/workflows/vllm.yml
pull_request:
branches:
- main
- v0.2.x
paths:
- "**/*.py"
- "verl/trainer/config/*.yaml"
- .github/workflows/sgl.yml

# Cancel jobs on the same ref if a new one is triggered
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

# Declare permissions just read content.
permissions:
contents: read

jobs:
sgl:
runs-on: [self-hosted, l20-0]
timeout-minutes: 20 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1"
HF_HUB_ENABLE_HF_TRANSFER: 1
container:
image: ocss884/verl-sglang:ngc-th2.5.1-cu126-sglang0.4.4.post3
options: --gpus all --shm-size=10g
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install hf_transfer
pip3 install -e .[test,gpu,sglang] --no-deps
- name: Test the latest SGLang
run: |
cd tests/rollout
torchrun --standalone --nnodes=1 --nproc_per_node=4 $(which pytest) -s test_sglang_spmd.py
- name: Test the latest SGLang async
run: |
cd tests/rollout
torchrun --standalone --nnodes=1 --nproc_per_node=4 $(which pytest) -s test_async_sglang_spmd.py
40 changes: 40 additions & 0 deletions docs/sglang_multiturn/multiturn.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
Multi-turn Rollout Support
=========================

Basic Configuration
~~~~~~~~~~~~~~~~~

To enable multi-turn rollout, make sure to configure the following fields in your rollout configuration:

.. code-block:: yaml

actor_rollout_ref:
rollout:
multi_turn: True
name: "sglang_async"

These configuration activates the sglang_async engine for multi-turn interaction during rollout.

Custom Tool Configuration
~~~~~~~~~~~~~~~~~~~~~~~

For custom environment interaction tools, you can specify your tool configurations in a YAML file.
To do so, use the following format in your rollout config:

.. code-block:: yaml

actor_rollout_ref:
rollout:
tool_kwargs:
tools_config_file: <path_to_tool_yaml_file>

This allows integration of customized tool behaviors during actor rollout steps. You may refer to the GSM8KTool_example_configuration_ for guidance.

GSM8K Multi-turn Training Performance
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

See the training performance of multi-turn rollout on the GSM8K task HERE_.

.. _HERE: https://wandb.ai/zhaochenyang20/gsm8k_async_rl/runs/1ro1r7om?nw=nwuserzhaochenyang20

.. _GSM8KTool_example_configuration: ../../examples/sglang_multiturn/config/tool_config/gsm8k_tool_config.yaml
22 changes: 22 additions & 0 deletions examples/sglang_multiturn/config/gsm8k_multiturn_grpo.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
hydra:
searchpath:
- file://verl/trainer/config

defaults:
- ppo_trainer
- _self_

data:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please follow https://github.com/volcengine/verl/blob/main/recipe/prime/config/prime_trainer.yaml and only add critical config diff based on the default trainer config from verl

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where do we have grpo yaml?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there isn't one but you can inherit config from ppo_trainer and only include values different from the base config

max_prompt_length: 1024
max_response_length: 1024
train_batch_size: 256
return_raw_chat: True

actor_rollout_ref:
hybrid_engine: True
rollout:
name: sglang_async
multi_turn:
enable: True
max_turns: 5
# tool_config_path: "./config/tool_config/gsm8k_tool_config.yaml"
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
tools:
- class_name: "verl.workers.tool.gsm8k_tool.Gsm8kTool"
config: {}
tool_schema:
type: "function"
function:
name: "calc_gsm8k_reward"
description: "A tool for calculating the reward of gsm8k. (1.0 if your answer is correct, 0.0 if your answer is incorrect)"
parameters:
type: "object"
properties:
answer:
type: "string"
description: "The model's answer to the GSM8K math problem, must be a digits"
required: ["answer"]
106 changes: 106 additions & 0 deletions examples/sglang_multiturn/gsm8k.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move it to data_preprocess. Rename gsm8k_tools.py

Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
# Copyright 2024 Bytedance Ltd. and/or its affiliates
# Copyright 2023-2024 SGLang Team and ModelBest Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Preprocess the GSM8k dataset to parquet format
"""

import argparse
import os
import re

import datasets

from verl.utils.hdfs_io import copy, makedirs


def extract_solution(solution_str):
solution = re.search("#### (\\-?[0-9\\.\\,]+)", solution_str)
assert solution is not None
final_solution = solution.group(0)
final_solution = final_solution.split("#### ")[1].replace(",", "")
return final_solution


if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--local_dir", default="~/data/gsm8k")
parser.add_argument("--hdfs_dir", default=None)

args = parser.parse_args()

data_source = "openai/gsm8k"
dataset = datasets.load_dataset(data_source, "main")

train_dataset = dataset["train"]
test_dataset = dataset["test"]

instruction_following = "You must use the `calc_gsm8k_reward` tool to calculate the reward of your answer(1.0 if your answer is correct, 0.0 if your answer is incorrect) before submitting it at least once and refine your answer if necessary. Put your final answer in the format of `#### <answer>`."

# add a row to each data item that represents a unique id
def make_map_fn(split):
def process_fn(example, idx):
question_raw = example.pop("question")

question = question_raw + " " + instruction_following

answer_raw = example.pop("answer")
solution = extract_solution(answer_raw)
data = {
"data_source": data_source,
"prompt": [
{
"role": "system",
"content": "You are a math expert. You are given a question and you need to solve it step by step. `calc_gsm8k_reward` is a tool for calculating the reward of gsm8k. You should use this tool to calculate the reward of your answer(1.0 if your answer is correct, 0.0 if your answer is incorrect) before submitting it and refine your answer if necessary. Put your final answer in the format of `#### <answer>`.",
},
{
"role": "user",
"content": question,
},
],
"ability": "math",
"reward_model": {"style": "rule", "ground_truth": solution},
"extra_info": {
"split": split,
"index": idx,
"answer": answer_raw,
"question": question_raw,
"need_tools_kwargs": True,
"tools_kwargs": {
"calc_gsm8k_reward": {
"create_kwargs": {"ground_truth": solution},
# "execute_kwargs": {},
# "calc_reward_kwargs": {},
# "release_kwargs": {},
},
},
},
}
return data

return process_fn

train_dataset = train_dataset.map(function=make_map_fn("train"), with_indices=True)
test_dataset = test_dataset.map(function=make_map_fn("test"), with_indices=True)

local_dir = args.local_dir
hdfs_dir = args.hdfs_dir

train_dataset.to_parquet(os.path.join(local_dir, "train.parquet"))
test_dataset.to_parquet(os.path.join(local_dir, "test.parquet"))

if hdfs_dir is not None:
makedirs(hdfs_dir)

copy(src=local_dir, dst=hdfs_dir)
40 changes: 40 additions & 0 deletions examples/sglang_multiturn/readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Multi-Turn Rollout Example (GSM8K)

This example demonstrates how to perform **multi-turn rollout** using SGLang with a tool-calling capable model (e.g., Qwen2.5-3B) on the GSM8K dataset.

## Usage


### Step 1: Download GSM8K Dataset

```bash
cd examples/sglang_multiturn
python3 gsm8k.py
```

This will download and preprocess the GSM8K dataset into ~/data/gsm8k/.

### Step 2: Run Multi-Turn Rollout
If you have 8 GPUs
Use the standard 8-GPU script:

```
cd your_verl_root_dir
bash examples/sglang_multiturn/run_qwen2.5-3b_gsm8k_multiturn.sh
```

If you have only 4 GPUs
Use the fallback 4-GPU script:

```
cd your_verl_root_dir
bash examples/sglang_multiturn/run_qwen2.5-3b_gsm8k_multiturn_4xgpu.sh
```

# Notes

- The rollout supports multi-turn conversations with tool-calling capabilities.

- Current tools are used for GSM8K answer evaluation.

- Future versions may extend to search and code interpreter tools.
53 changes: 53 additions & 0 deletions examples/sglang_multiturn/run_qwen2.5-3b_gsm8k_multiturn.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# run on 8xH100
# make sure your current working directory is the root of the project

set -x

ulimit -n 65535

PROJECT_DIR="$(pwd)"
CONFIG_PATH="$PROJECT_DIR/examples/sglang_multiturn/config"

python3 -m verl.trainer.main_ppo \
--config-path="$CONFIG_PATH" \
--config-name='gsm8k_multiturn_grpo' \
algorithm.adv_estimator=grpo \
data.train_batch_size=256 \
data.max_prompt_length=1024 \
data.max_response_length=1024 \
data.filter_overlong_prompts=True \
data.truncation='error' \
data.return_raw_chat=True \
actor_rollout_ref.model.path=Qwen/Qwen2.5-3B-Instruct \
actor_rollout_ref.actor.optim.lr=1e-6 \
actor_rollout_ref.model.use_remove_padding=True \
actor_rollout_ref.actor.ppo_mini_batch_size=256 \
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=40 \
actor_rollout_ref.actor.use_kl_loss=True \
actor_rollout_ref.actor.kl_loss_coef=0.001 \
actor_rollout_ref.actor.kl_loss_type=low_var_kl \
actor_rollout_ref.actor.entropy_coeff=0 \
actor_rollout_ref.model.enable_gradient_checkpointing=True \
actor_rollout_ref.actor.fsdp_config.param_offload=False \
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=40 \
actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
actor_rollout_ref.rollout.name=sglang_async \
actor_rollout_ref.rollout.gpu_memory_utilization=0.5 \
actor_rollout_ref.rollout.n=16 \
actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=40 \
actor_rollout_ref.ref.fsdp_config.param_offload=True \
algorithm.use_kl_in_reward=False \
trainer.critic_warmup=0 \
trainer.logger=['console','wandb'] \
trainer.project_name='gsm8k_async_rl' \
trainer.experiment_name='qwen2.5-3b_function_rm-gsm8k-async-sgl-multi-w-tool-verify-n16' \
trainer.n_gpus_per_node=8 \
trainer.nnodes=1 \
trainer.save_freq=-1 \
trainer.test_freq=20 \
data.train_files=$HOME/data/gsm8k/train.parquet \
data.val_files=$HOME/data/gsm8k/test.parquet \
actor_rollout_ref.rollout.multi_turn.tool_config_path="$PROJECT_DIR/examples/sglang_multiturn/config/tool_config/gsm8k_tool_config.yaml" \
trainer.total_epochs=15 $@

Loading
Loading