Problem with accelerate>=1.0.0 when running official PPO/RLOO examples #2555

dawidm · 2025-01-10T13:57:15Z

System Info

Platform: Linux-5.15.0-122-generic-x86_64-with-glibc2.35
Python version: 3.11.10
PyTorch version: 2.5.1+cu124
CUDA device(s): NVIDIA GeForce RTX 3090, NVIDIA GeForce RTX 3090, NVIDIA GeForce RTX 3090, NVIDIA GeForce RTX 3090
Transformers version: 4.47.1
Accelerate version: 1.2.1
Accelerate config: not found
Datasets version: 3.2.0
HF Hub version: 0.27.1
TRL version: 0.14.0.dev0+edabe0a
bitsandbytes version: not installed
DeepSpeed version: 0.15.4
Diffusers version: not installed
Liger-Kernel version: not installed
LLM-Blender version: not installed
OpenAI version: not installed
PEFT version: not installed

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder
My own task or dataset (give details below)

Reproduction

I ran official examples for PPO and RLOO and got error when Accelerator is created.
I only made one change in examples/accelerate_configs/deepspeed_zero3.yaml: num_processes: 4 to run it with 4 GPUs.

PPO

accelerate launch --config_file examples/accelerate_configs/deepspeed_zero3.yaml \
    examples/scripts/ppo/ppo.py \
    --dataset_name trl-internal-testing/descriptiveness-sentiment-trl-style \
    --dataset_train_split descriptiveness \
    --output_dir models/minimal/ppo \
    --num_ppo_epochs 1 \
    --num_mini_batches 1 \
    --learning_rate 3e-6 \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 16 \
    --total_episodes 10000 \
    --model_name_or_path EleutherAI/pythia-1b-deduped \
    --sft_model_path EleutherAI/pythia-1b-deduped \
    --reward_model_path EleutherAI/pythia-1b-deduped \
    --local_rollout_forward_batch_size 1 \
    --missing_eos_penalty 1.0
"""

Traceback:

[rank1]:   File "/root/trl-orig/examples/scripts/ppo/ppo.py", line 152, in <module>                                                                                                           
[rank1]:     trainer = PPOTrainer(                                                                                                                                                            
[rank1]:               ^^^^^^^^^^^                                                                                                                                                            
[rank1]:   File "/opt/conda/lib/python3.11/site-packages/transformers/utils/deprecation.py", line 165, in wrapped_func                                                                        
[rank1]:     return func(*args, **kwargs)                                                                                                                                                     
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                     
[rank1]:   File "/opt/conda/lib/python3.11/site-packages/transformers/utils/deprecation.py", line 165, in wrapped_func                                                                        
[rank1]:     return func(*args, **kwargs)                                                                                                                                                     
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                     
[rank1]:   File "/opt/conda/lib/python3.11/site-packages/transformers/utils/deprecation.py", line 165, in wrapped_func                                                                        
[rank1]:     return func(*args, **kwargs)                                                                                                                                                     
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                     
[rank1]:   [Previous line repeated 1 more time]                                                                                                                                               
[rank1]:   File "/root/trl-orig/trl/trainer/ppo_trainer.py", line 186, in __init__                                                                                                            
[rank1]:     accelerator = Accelerator(gradient_accumulation_steps=args.gradient_accumulation_steps)                                                                                          
[rank1]:                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                          
[rank1]:   File "/opt/conda/lib/python3.11/site-packages/accelerate/accelerator.py", line 302, in __init__                                                                                    
[rank1]:     deepspeed_plugins = AcceleratorState().deepspeed_plugins                                                                                                                         
[rank1]:                         ^^^^^^^^^^^^^^^^^^                                                                                                                                           
[rank1]:   File "/opt/conda/lib/python3.11/site-packages/accelerate/state.py", line 887, in __init__                                                                                          
[rank1]:     raise ValueError(                                                                 
[rank1]: ValueError: Please make sure to properly initialize your accelerator via `accelerator = Accelerator()` before using any functionality from the `accelerate` library.

RLOO

accelerate launch --config_file examples/accelerate_configs/deepspeed_zero3.yaml \
    examples/scripts/rloo/rloo.py \
    --dataset_name trl-internal-testing/descriptiveness-sentiment-trl-style \
    --dataset_train_split descriptiveness \
    --output_dir models/minimal/rloo \
    --rloo_k 2 \
    --num_ppo_epochs 1 \
    --num_mini_batches 1 \
    --learning_rate 3e-6 \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 16 \
    --total_episodes 10000 \
    --model_name_or_path EleutherAI/pythia-1b-deduped \
    --sft_model_path EleutherAI/pythia-1b-deduped \
    --reward_model_path EleutherAI/pythia-1b-deduped \
    --local_rollout_forward_batch_size 1 \
    --missing_eos_penalty 1.0

Traceback:

[rank0]: Traceback (most recent call last):                                                                                                                                                   
[rank0]:   File "/root/trl-orig/examples/scripts/rloo/rloo.py", line 125, in <module>                                                                                                         
[rank0]:     trainer = RLOOTrainer(                                                                                                                                                           
[rank0]:               ^^^^^^^^^^^^                                                                                                                                                           
[rank0]:   File "/root/trl-orig/trl/trainer/rloo_trainer.py", line 124, in __init__                                                                                                           
[rank0]:     accelerator = Accelerator(gradient_accumulation_steps=args.gradient_accumulation_steps)                                                                                          
[rank0]:                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                          
[rank0]:   File "/opt/conda/lib/python3.11/site-packages/accelerate/accelerator.py", line 302, in __init__                                                                                    
[rank0]:     deepspeed_plugins = AcceleratorState().deepspeed_plugins                                                                                                                         
[rank0]:                         ^^^^^^^^^^^^^^^^^^                                                                                                                                           
[rank0]:   File "/opt/conda/lib/python3.11/site-packages/accelerate/state.py", line 887, in __init__                                                                                          
[rank0]:     raise ValueError(                                                                                                                                                                
[rank0]: ValueError: Please make sure to properly initialize your accelerator via `accelerator = Accelerator()` before using any functionality from the `accelerate` library.

Expected behavior

This doesn't happen with accelerate==0.34.2. I also checked accelerate==1.0.0 and get same errors. Is trl supposed to work with accelerate>=1.0.0?

Checklist

I have checked that my issue isn't already filed (see open issues)
I have included my system information
Any code provided is minimal, complete, and reproducible (more on MREs)
Any code provided is properly formatted in code blocks, (no screenshot, more on code blocks)
Any traceback provided is complete

The text was updated successfully, but these errors were encountered:

Yukino256 · 2025-01-14T08:42:14Z

same issue, and i tried the accelerate==0.34.2, ppo runs well.

reihig-ut · 2025-01-16T08:37:05Z

I encountered the same error with BCO.
This might be related: huggingface/accelerate#3337 (comment)

Superskyyy · 2025-01-20T05:06:59Z

same issue

daehuikim · 2025-01-30T15:42:06Z

#2696 I encountered the same error here either.

hlnchen · 2025-02-04T00:38:02Z

I would suggest using

self.create_accelerator_and_postprocess()

like in the base class instead of directly initializing Accelerator() in these trainers.

Superskyyy · 2025-02-04T00:54:17Z

I would suggest using
self.create_accelerator_and_postprocess()
like in the base class instead of directly initializing Accelerator() in these trainers.

There's a RLOOv2 PR should also fix this

August-murr added 🏋 PPO Related to PPO 🏋 RLOO Related to RLOO ⚡accelerate Related to accelerate labels Jan 11, 2025

Superskyyy mentioned this issue Feb 4, 2025

Running TRL example doesn't work when using deepspeed on accelerate huggingface/accelerate#3354

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with accelerate>=1.0.0 when running official PPO/RLOO examples #2555

Problem with accelerate>=1.0.0 when running official PPO/RLOO examples #2555

dawidm commented Jan 10, 2025

Yukino256 commented Jan 14, 2025

reihig-ut commented Jan 16, 2025

Superskyyy commented Jan 20, 2025

daehuikim commented Jan 30, 2025

hlnchen commented Feb 4, 2025 •

edited

Loading

Superskyyy commented Feb 4, 2025

Problem with accelerate>=1.0.0 when running official PPO/RLOO examples #2555

Problem with accelerate>=1.0.0 when running official PPO/RLOO examples #2555

Comments

dawidm commented Jan 10, 2025

System Info

Information

Tasks

Reproduction

PPO

RLOO

Expected behavior

Checklist

Yukino256 commented Jan 14, 2025

reihig-ut commented Jan 16, 2025

Superskyyy commented Jan 20, 2025

daehuikim commented Jan 30, 2025

hlnchen commented Feb 4, 2025 • edited Loading

Superskyyy commented Feb 4, 2025

hlnchen commented Feb 4, 2025 •

edited

Loading