Skip to content

Commit

Permalink
update trl version (#3117)
Browse files Browse the repository at this point in the history
  • Loading branch information
Jintao-Huang authored Feb 14, 2025
1 parent e9503bb commit b84854b
Show file tree
Hide file tree
Showing 11 changed files with 12 additions and 11 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ Running Environment:
| transformers | >=4.33 | 4.48.3 | |
| modelscope | >=1.19 | | |
| peft | >=0.11.0,<0.15.0 | | |
| trl | >=0.13,<0.16 | 0.14.0 | RLHF |
| trl | >=0.13,<0.16 | 0.15 | RLHF |
| deepspeed | >=0.14 | | Training |
| vllm | >=0.5.1 | 0.6.5 | Inference/Deployment/Evaluation |
| lmdeploy | lmdeploy>=0.5,<0.6.5 | 0.6.4 | Inference/Deployment/Evaluation |
Expand Down
2 changes: 1 addition & 1 deletion README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ pip install -e .
| transformers | >=4.33 | 4.48.3 ||
| modelscope | >=1.19 | ||
| peft | >=0.11.0,<0.15.0 | ||
| trl | >=0.13,<0.16 | 0.14.0 |RLHF|
| trl | >=0.13,<0.16 | 0.15 |RLHF|
| deepspeed | >=0.14 | |训练|
| vllm | >=0.5.1 | 0.6.5 |推理/部署/评测|
| lmdeploy | lmdeploy>=0.5,<0.6.5 | 0.6.4 |推理/部署/评测|
Expand Down
2 changes: 1 addition & 1 deletion docs/source/GetStarted/SWIFT安装.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ pip install ms-swift==2.*
| transformers | >=4.33 | 4.48.3 ||
| modelscope | >=1.19 | ||
| peft | >=0.11.0,<0.15.0 | ||
| trl | >=0.13,<0.16 | 0.14.0 |RLHF|
| trl | >=0.13,<0.16 | 0.15 |RLHF|
| deepspeed | >=0.14 | |训练|
| vllm | >=0.5.1 | 0.6.5 |推理/部署/评测|
| lmdeploy | lmdeploy>=0.5,<0.6.5 | 0.6.4 |推理/部署/评测|
Expand Down
2 changes: 1 addition & 1 deletion docs/source/Instruction/GRPO.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
环境安装
```bash
pip install math_verify # reward function
pip install git+https://github.com/huggingface/trl.git # trl>=0.15.0.dev0
pip install "trl>=0.15"
```

**注意**:训练过程中 loss 接近0 是正常情况, 参考[issue](https://github.com/huggingface/open-r1/issues/239#issuecomment-2646297851)
Expand Down
2 changes: 1 addition & 1 deletion docs/source_en/GetStarted/SWIFT-installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ You can view the image [here](https://modelscope.cn/docs/intro/environment-setup
| transformers | >=4.33 | 4.48.3 | |
| modelscope | >=1.19 | | |
| peft | >=0.11.0,<0.15.0 | | |
| trl | >=0.13,<0.16 | 0.14.0 | RLHF |
| trl | >=0.13,<0.16 | 0.15 | RLHF |
| deepspeed | >=0.14 | | Training |
| vllm | >=0.5.1 | 0.6.5 | Inference/Deployment/Evaluation |
| lmdeploy | lmdeploy>=0.5,<0.6.5 | 0.6.4 | Inference/Deployment/Evaluation |
Expand Down
2 changes: 1 addition & 1 deletion docs/source_en/Instruction/GRPO.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ environments

```bash
pip install math_verify # reward function
pip install git+https://github.com/huggingface/trl.git # trl>=0.15.0.dev0
pip install "trl>=0.15"
```

**Note**: It is normal for the loss to approach zero during training. Refer to this [issue](https://github.com/huggingface/open-r1/issues/239#issuecomment-2646297851) for more details.
Expand Down
2 changes: 1 addition & 1 deletion examples/train/grpo/full_vllm.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# One GPU is left for vLLM inference acceleration.
# pip install math_verify # reward function
# pip install git+https://github.com/huggingface/trl.git # trl>=0.15.0.dev0
# pip install "trl>=0.15"
# GPU memory: 8 * 80GiB

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
Expand Down
2 changes: 1 addition & 1 deletion examples/train/grpo/grpo.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# pip install math_verify # reward function
# pip install git+https://github.com/huggingface/trl.git # trl>=0.15.0.dev0
# pip install "trl>=0.15"
# GPU memory: 80GiB
# You can set `--reward_model` to use a reward model to provide rewards.
CUDA_VISIBLE_DEVICES=0 \
Expand Down
2 changes: 1 addition & 1 deletion examples/train/grpo/multi_node/multi_node1.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# pip install math_verify # reward function
# pip install git+https://github.com/huggingface/trl.git # trl>=0.15.0.dev0
# pip install "trl>=0.15"
export CUDA_VISIBLE_DEVICES=0,1,2,3
export NNODES=2
export NODE_RANK=0
Expand Down
2 changes: 1 addition & 1 deletion examples/train/grpo/plugin/run_external_rm.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# pip install math_verify # reward function
# pip install git+https://github.com/huggingface/trl.git # trl>=0.15.0.dev0
# pip install "trl>=0.15"
# GPU memory: 80GiB

CUDA_VISIBLE_DEVICES=0 \
Expand Down
3 changes: 2 additions & 1 deletion swift/trainers/rlhf_trainer/grpo_trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
import torch.nn as nn
from accelerate.utils import broadcast_object_list, gather, gather_object
from transformers import PreTrainedModel
from transformers.utils.versions import require_version
from trl import GRPOTrainer as HFGRPOTrainer
from trl.models import unwrap_model_for_generation

Expand Down Expand Up @@ -38,7 +39,7 @@ def __init__(self,
reward_funcs: Optional[List[Union[str, Callable]]] = None,
*_args,
**kwargs):

require_version('trl>=0.15')
args = kwargs['args']

self.processing_class = kwargs.get('template').tokenizer
Expand Down

0 comments on commit b84854b

Please sign in to comment.