`accelerate` integration #58

younesbelkada · 2022-12-27T12:46:49Z

What does this PR do?

This PR integrates trl with accelerate to make it compatible with the tools provided by the library to be able to train models using PPOTrainer. This would enable users to train their models in mixed precision, using Data Parallelism etc in a very simple manner.
Users should design their own training script and run them using accelerate launch xxx.py based on the example scripts provided in examples/scripits.

This PR also integrates Data Parallelism paradigm, enabling users to benefit from multi-GPU training if they want to speedup training.

TODOs

DeepSpeed tests (check where it works)

zero-0
zero-1
zero-2
zero-3

younesbelkada · 2022-12-27T12:47:14Z

examples/scripts/04-ppo-sentiment-accelerate.py

+    #### Run PPO step 
+    t = time.time()
+    stats = ppo_trainer.step(query_tensors, response_tensors, rewards)
+    ppo_trainer.log_stats(stats, timing, batch, rewards, t0, t, logs)


To improve, we probably want a better way to log the stats

trl/trainer/accelerate_ppo.py

younesbelkada · 2022-12-27T12:49:20Z

trl/trainer/accelerate_ppo.py

+            if isinstance(v, torch.Tensor) and k != 'objective/kl':
+                # tensor_list = [torch.zeros_like(v) for _ in range(self.accelerator.num_processes)]
+                dist.all_reduce(v, dist.ReduceOp.SUM)
+                v /= self.accelerator.num_processes


For me in a DP setup, each GPU will need to have its own replica of objective/kl since this is used to update the kl_ctl object above. That is why I prefered to not include it in the all_reduce operation but I just wanted to confirm

- add docstring on most functions - correct logging

lvwerra

Thanks for this @younesbelkada. My main comments are about DP. I think if we don't wrap the step inputs (queries/responses) in a dataloader we don't achieve proper DP. But maybe I am wrong?

wandb/run-20221213_140454-xsw0kjtv/files/requirements.txt

trl/trainer/accelerate_ppo.py

lvwerra · 2022-12-27T14:57:30Z

trl/trainer/accelerate_ppo.py

+            model (torch.model): Hugging Face transformer GPT2 model with value head
+            ref_model (torch.model): Hugging Face transformer GPT2 refrence model used for KL penalty
+            tokenizer (tokenizer): Hugging Face tokenizer
+            ppo_params (dict or None): PPO parameters for training. Can include following keys:


We should replace **config (=ppo_params) with explicit kwargs or setup TrainingArguments like in transformers.

Can be a follow up PR btw

trl/trainer/accelerate_ppo.py

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

- remove `_build_dataset` method - change name to `PPOTrainer`

- random init seems to converge much faster

younesbelkada · 2022-12-29T11:55:32Z

wandb run (multi-GPU) after the latest commit: https://wandb.ai/distill-bloom/trl/runs/1mps4h09?workspace=user-younesbelkada

lvwerra

I think we are pretty close - a few open questions and minor changs :)

trl/trainer/ppo_config.py

trl/trainer/ppo_trainer.py

lvwerra · 2022-12-29T13:51:11Z

trl/trainer/ppo_trainer.py

+            stats (dict[str, Any]): 
+                a dictionary of stats with the tensors gathered.
+        """
+        import torch.distributed as dist


what do you think?

trl/trainer/utils.py

lvwerra · 2022-12-29T13:57:03Z

trl/trainer/ppo_trainer.py

+        # In a distributed setup, only logging needs to be performed on the main process
+        # check: https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html
+        # or: https://discuss.pytorch.org/t/use-distributed-data-parallel-correctly/82500/11
+        self.is_distributed = self.accelerator.distributed_type == "MULTI_GPU"


If we can use accelerates gather method we can probably get rid of this?

examples/scripts/04-ppo-sentiment-accelerate.py

lvwerra · 2022-12-29T14:00:39Z

examples/scripts/04-ppo-sentiment-accelerate.py

+
+    #### Compute sentiment score
+    t = time.time()
+    texts = [q + r for q,r in zip(batch['query'], batch['response'])]


with the remove columns method inside the trainer the query shouldn't be there anymore? since we don't pass the data through the model internally, we don't need to remove the columns?

The query are kept here, https://github.com/younesbelkada/trl/blob/d2c363fe4018c74df829ed6c067fad50ecaaf479/trl/trainer/ppo_trainer.py#L152 but maybe we can change that, wdyt?

examples/scripts/04-ppo-sentiment-accelerate.py

younesbelkada · 2022-12-29T17:28:09Z

Wandb log of the final run: https://wandb.ai/distill-bloom/trl/runs/dcd2gqn1?workspace=user-younesbelkada

HuggingFaceDocBuilderDev · 2022-12-29T17:28:46Z

The documentation is not available anymore as the PR was closed or merged.

* working v1 * add `accelerate` on requirements * add `accelerate` on `setup.py` * add `datasets` on `setup.py` * small updates - add docstring on most functions - correct logging * rm unneeded file * replace with `generate` * Update trl/trainer/accelerate_ppo.py Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * correct return * add dataloader support * add `wandb` to `setup.py` * refactor - remove `_build_dataset` method - change name to `PPOTrainer` * test * fix test * rename file * refactor * remove unneeded device assignment * fix correct device assignment * standardize docstrings * add `wandb` on `dev` * fix slow convergence - random init seems to converge much faster * oops * revert fix * revert patch * remove unneeded reshape * add input safety checker * refactor - added comments on example - fixes CI test - rewards should be a list of tensors - clearer error messages - remove build model method - refactor log stats method Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * refactor - added `PPOConfig` class - docstring on `LengthSampler` - fix test - gather rewards when logging - unwrap model when calling generate * some refactor * remove unneeded hack * adapt dataset * fix test * remove rollout * remove timing * remove `shuffle=True` * remove `LengthSampler` from trainer * refactor * remove text length sampler args from config * change collate_fn * fix silent bug * rename * move file * refactor base trainer * fix collate * final bug Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

working v1

9c977d0

younesbelkada commented Dec 27, 2022

View reviewed changes

add accelerate on requirements

1971cea

younesbelkada mentioned this pull request Dec 27, 2022

[PoC] accelerate support for trl #50

Closed

5 tasks

younesbelkada added 3 commits December 27, 2022 12:53

add accelerate on setup.py

45cad09

add datasets on setup.py

a0ebdaa

small updates

dec21f3

- add docstring on most functions - correct logging

lvwerra reviewed Dec 27, 2022

View reviewed changes

younesbelkada and others added 22 commits December 27, 2022 15:18

rm unneeded file

4254292

replace with generate

19f4d92

Update trl/trainer/accelerate_ppo.py

35330a9

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

correct return

34773de

add dataloader support

b810d8a

add wandb to setup.py

e4c57b2

refactor

7516b37

- remove `_build_dataset` method - change name to `PPOTrainer`

test

40f81e0

fix test

b1638e5

rename file

e2e7a90

refactor

96b4115

remove unneeded device assignment

5eb46ad

fix correct device assignment

609f718

standardize docstrings

4d57b47

add wandb on dev

fac85b5

fix slow convergence

c1b166b

- random init seems to converge much faster

oops

9495f2a

revert fix

c813857

revert patch

157eca6

Merge remote-tracking branch 'origin/master' into accelerate-ppo

2efb961

remove unneeded reshape

0a1c9a2

add input safety checker

b6004f0

some refactor

7615994

younesbelkada added 3 commits December 29, 2022 12:07

remove unneeded hack

65be5bd

adapt dataset

edd5ea3

fix test

76c2afd

lvwerra reviewed Dec 29, 2022

View reviewed changes

younesbelkada added 14 commits December 29, 2022 14:10

remove rollout

5d41170

remove timing

7843a34

remove shuffle=True

6cd89d5

remove LengthSampler from trainer

4e802e8

refactor

6012a9b

remove text length sampler args from config

d2c363f

change collate_fn

d048bbe

fix silent bug

66f23b1

rename

e318307

move file

31d12d6

refactor base trainer

48c1070

fix collate

e9cec71

Merge remote-tracking branch 'origin/master' into accelerate-ppo

9a987d4

final bug

244f001

younesbelkada marked this pull request as ready for review December 29, 2022 17:27

younesbelkada mentioned this pull request Dec 29, 2022

Roadmap - trl 0.2 #64

Closed

26 tasks

lvwerra approved these changes Dec 30, 2022

View reviewed changes

younesbelkada merged commit b127900 into huggingface:master Dec 30, 2022

PRASANTH-1427 mentioned this pull request Sep 13, 2023

getting error while running the sft_llama2.py #762

Closed

August-murr mentioned this pull request Jan 6, 2025

onlinedpo error when use deepspeed zero3 August-murr/trl#7

Open

9 tasks

qgallouedec mentioned this pull request Dec 10, 2025

Move AutoModelForCausalLMWithValueHead and AutoModelForSeq2SeqLMWithValueHead to experimental #4654

Merged

accelerate integration #58

accelerate integration #58

Uh oh!

Conversation

younesbelkada commented Dec 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

TODOs

DeepSpeed tests (check where it works)

Uh oh!

younesbelkada Dec 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

younesbelkada Dec 27, 2022

Choose a reason for hiding this comment

Uh oh!

lvwerra left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lvwerra Dec 27, 2022

Choose a reason for hiding this comment

Uh oh!

lvwerra Dec 27, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

younesbelkada commented Dec 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lvwerra left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lvwerra Dec 29, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lvwerra Dec 29, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lvwerra Dec 29, 2022

Choose a reason for hiding this comment

Uh oh!

younesbelkada Dec 29, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

younesbelkada commented Dec 29, 2022

Uh oh!

HuggingFaceDocBuilderDev commented Dec 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

`accelerate` integration #58

`accelerate` integration #58

younesbelkada commented Dec 27, 2022 •

edited

Loading

younesbelkada Dec 27, 2022 •

edited

Loading

younesbelkada commented Dec 29, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Dec 29, 2022 •

edited

Loading