Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with accelerate>=1.0.0 when running official PPO/RLOO examples #2555

Open
7 of 9 tasks
dawidm opened this issue Jan 10, 2025 · 6 comments
Open
7 of 9 tasks

Problem with accelerate>=1.0.0 when running official PPO/RLOO examples #2555

dawidm opened this issue Jan 10, 2025 · 6 comments
Labels
⚡accelerate Related to accelerate 🏋 PPO Related to PPO 🏋 RLOO Related to RLOO

Comments

@dawidm
Copy link
Contributor

dawidm commented Jan 10, 2025

System Info

  • Platform: Linux-5.15.0-122-generic-x86_64-with-glibc2.35
  • Python version: 3.11.10
  • PyTorch version: 2.5.1+cu124
  • CUDA device(s): NVIDIA GeForce RTX 3090, NVIDIA GeForce RTX 3090, NVIDIA GeForce RTX 3090, NVIDIA GeForce RTX 3090
  • Transformers version: 4.47.1
  • Accelerate version: 1.2.1
  • Accelerate config: not found
  • Datasets version: 3.2.0
  • HF Hub version: 0.27.1
  • TRL version: 0.14.0.dev0+edabe0a
  • bitsandbytes version: not installed
  • DeepSpeed version: 0.15.4
  • Diffusers version: not installed
  • Liger-Kernel version: not installed
  • LLM-Blender version: not installed
  • OpenAI version: not installed
  • PEFT version: not installed

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder
  • My own task or dataset (give details below)

Reproduction

I ran official examples for PPO and RLOO and got error when Accelerator is created.
I only made one change in examples/accelerate_configs/deepspeed_zero3.yaml: num_processes: 4 to run it with 4 GPUs.

PPO

accelerate launch --config_file examples/accelerate_configs/deepspeed_zero3.yaml \
    examples/scripts/ppo/ppo.py \
    --dataset_name trl-internal-testing/descriptiveness-sentiment-trl-style \
    --dataset_train_split descriptiveness \
    --output_dir models/minimal/ppo \
    --num_ppo_epochs 1 \
    --num_mini_batches 1 \
    --learning_rate 3e-6 \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 16 \
    --total_episodes 10000 \
    --model_name_or_path EleutherAI/pythia-1b-deduped \
    --sft_model_path EleutherAI/pythia-1b-deduped \
    --reward_model_path EleutherAI/pythia-1b-deduped \
    --local_rollout_forward_batch_size 1 \
    --missing_eos_penalty 1.0
"""

Traceback:

[rank1]:   File "/root/trl-orig/examples/scripts/ppo/ppo.py", line 152, in <module>                                                                                                           
[rank1]:     trainer = PPOTrainer(                                                                                                                                                            
[rank1]:               ^^^^^^^^^^^                                                                                                                                                            
[rank1]:   File "/opt/conda/lib/python3.11/site-packages/transformers/utils/deprecation.py", line 165, in wrapped_func                                                                        
[rank1]:     return func(*args, **kwargs)                                                                                                                                                     
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                     
[rank1]:   File "/opt/conda/lib/python3.11/site-packages/transformers/utils/deprecation.py", line 165, in wrapped_func                                                                        
[rank1]:     return func(*args, **kwargs)                                                                                                                                                     
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                     
[rank1]:   File "/opt/conda/lib/python3.11/site-packages/transformers/utils/deprecation.py", line 165, in wrapped_func                                                                        
[rank1]:     return func(*args, **kwargs)                                                                                                                                                     
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                     
[rank1]:   [Previous line repeated 1 more time]                                                                                                                                               
[rank1]:   File "/root/trl-orig/trl/trainer/ppo_trainer.py", line 186, in __init__                                                                                                            
[rank1]:     accelerator = Accelerator(gradient_accumulation_steps=args.gradient_accumulation_steps)                                                                                          
[rank1]:                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                          
[rank1]:   File "/opt/conda/lib/python3.11/site-packages/accelerate/accelerator.py", line 302, in __init__                                                                                    
[rank1]:     deepspeed_plugins = AcceleratorState().deepspeed_plugins                                                                                                                         
[rank1]:                         ^^^^^^^^^^^^^^^^^^                                                                                                                                           
[rank1]:   File "/opt/conda/lib/python3.11/site-packages/accelerate/state.py", line 887, in __init__                                                                                          
[rank1]:     raise ValueError(                                                                 
[rank1]: ValueError: Please make sure to properly initialize your accelerator via `accelerator = Accelerator()` before using any functionality from the `accelerate` library.

RLOO

accelerate launch --config_file examples/accelerate_configs/deepspeed_zero3.yaml \
    examples/scripts/rloo/rloo.py \
    --dataset_name trl-internal-testing/descriptiveness-sentiment-trl-style \
    --dataset_train_split descriptiveness \
    --output_dir models/minimal/rloo \
    --rloo_k 2 \
    --num_ppo_epochs 1 \
    --num_mini_batches 1 \
    --learning_rate 3e-6 \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 16 \
    --total_episodes 10000 \
    --model_name_or_path EleutherAI/pythia-1b-deduped \
    --sft_model_path EleutherAI/pythia-1b-deduped \
    --reward_model_path EleutherAI/pythia-1b-deduped \
    --local_rollout_forward_batch_size 1 \
    --missing_eos_penalty 1.0

Traceback:

[rank0]: Traceback (most recent call last):                                                                                                                                                   
[rank0]:   File "/root/trl-orig/examples/scripts/rloo/rloo.py", line 125, in <module>                                                                                                         
[rank0]:     trainer = RLOOTrainer(                                                                                                                                                           
[rank0]:               ^^^^^^^^^^^^                                                                                                                                                           
[rank0]:   File "/root/trl-orig/trl/trainer/rloo_trainer.py", line 124, in __init__                                                                                                           
[rank0]:     accelerator = Accelerator(gradient_accumulation_steps=args.gradient_accumulation_steps)                                                                                          
[rank0]:                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                          
[rank0]:   File "/opt/conda/lib/python3.11/site-packages/accelerate/accelerator.py", line 302, in __init__                                                                                    
[rank0]:     deepspeed_plugins = AcceleratorState().deepspeed_plugins                                                                                                                         
[rank0]:                         ^^^^^^^^^^^^^^^^^^                                                                                                                                           
[rank0]:   File "/opt/conda/lib/python3.11/site-packages/accelerate/state.py", line 887, in __init__                                                                                          
[rank0]:     raise ValueError(                                                                                                                                                                
[rank0]: ValueError: Please make sure to properly initialize your accelerator via `accelerator = Accelerator()` before using any functionality from the `accelerate` library.

Expected behavior

This doesn't happen with accelerate==0.34.2. I also checked accelerate==1.0.0 and get same errors. Is trl supposed to work with accelerate>=1.0.0?

Checklist

  • I have checked that my issue isn't already filed (see open issues)
  • I have included my system information
  • Any code provided is minimal, complete, and reproducible (more on MREs)
  • Any code provided is properly formatted in code blocks, (no screenshot, more on code blocks)
  • Any traceback provided is complete
@August-murr August-murr added 🏋 PPO Related to PPO 🏋 RLOO Related to RLOO ⚡accelerate Related to accelerate labels Jan 11, 2025
@Yukino256
Copy link

same issue, and i tried the accelerate==0.34.2, ppo runs well.

@reihig-ut
Copy link

I encountered the same error with BCO.
This might be related: huggingface/accelerate#3337 (comment)

@Superskyyy
Copy link
Contributor

same issue

@daehuikim
Copy link

#2696 I encountered the same error here either.

@hlnchen
Copy link

hlnchen commented Feb 4, 2025

I would suggest using

self.create_accelerator_and_postprocess()

like in the base class instead of directly initializing Accelerator() in these trainers.

@Superskyyy
Copy link
Contributor

I would suggest using

self.create_accelerator_and_postprocess()

like in the base class instead of directly initializing Accelerator() in these trainers.

There's a RLOOv2 PR should also fix this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⚡accelerate Related to accelerate 🏋 PPO Related to PPO 🏋 RLOO Related to RLOO
Projects
None yet
Development

No branches or pull requests

7 participants