Skip to content

Commit 279b0a9

Browse files
xingyaowwjunjzhang
authored andcommitted
fix instruct train
1 parent c9087c6 commit 279b0a9

File tree

3 files changed

+7
-4
lines changed

3 files changed

+7
-4
lines changed

README.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -36,12 +36,12 @@ Works for model <= 1.5B
3636
For Qwen2.5-0.5B base, we know it fails to learn reasoning.
3737

3838
```
39-
export CUDA_VISIBLE_DEVICES=7
39+
export CUDA_VISIBLE_DEVICES=3
4040
export N_GPUS=1
41-
export BASE_MODEL=Qwen/Qwen2.5-1.5B
41+
export BASE_MODEL=Qwen/Qwen2.5-0.5B
4242
export DATA_DIR=$HOME/data/countdown
4343
export WANDB_API_KEY=0929e692448f1bc929d71d7e3ece80073c3041e6
44-
export EXPERIMENT_NAME=countdown-qwen2.5-1.5b
44+
export EXPERIMENT_NAME=countdown-qwen2.5-0.5b
4545
export VLLM_ATTENTION_BACKEND=XFORMERS
4646
4747
PYTHONUNBUFFERE=1 python3 -m verl.trainer.main_ppo \
@@ -171,7 +171,7 @@ python examples/data_preprocess/countdown.py --template_type=qwen-instruct --loc
171171
Then use this data to train the instruct model.
172172

173173
```
174-
export CUDA_VISIBLE_DEVICES=4,5
174+
export CUDA_VISIBLE_DEVICES=0,1
175175
export N_GPUS=2
176176
export BASE_MODEL=Qwen/Qwen2.5-3B-Instruct
177177
export DATA_DIR=$HOME/data/countdown-qwen-instruct

examples/data_preprocess/countdown.py

+1
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,7 @@ def gen_dataset(
5353
def make_prefix(dp, template_type):
5454
target = dp['target']
5555
numbers = dp['nums']
56+
# NOTE: also need to change reward_score/countdown.py
5657
if template_type == 'base':
5758
"""This works for any base model"""
5859
prefix = f"""A conversation between User and Assistant. The user asks a question, and the Assistant solves it. The assistant first thinks about the reasoning process in the mind and then provides the user with the answer.

verl/utils/reward_score/countdown.py

+2
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@ def extract_solution(solution_str):
99
# Remove everything before the first "Assistant:"
1010
if "Assistant:" in solution_str:
1111
solution_str = solution_str.split("Assistant:", 1)[1]
12+
elif "<|im_start|>assistant" in solution_str:
13+
solution_str = solution_str.split("<|im_start|>assistant", 1)[1]
1214
else:
1315
return None
1416
solution_str = solution_str.split('\n')[-1]

0 commit comments

Comments
 (0)