Add T5 model #145

PhungVanDuy · 2022-12-21T02:04:16Z

Support T5 model to trlx
Fix tokenizer truncated problem, fix PPO state values, rewards at rollouts typo
Add some utils function: save the best model, save the model to hf, and separate arguments for rollouts and evaluation.
Examples for T5:
CNN/Dailly mail: https://wandb.ai/pvduy/trlx/runs/lx4iq23e
PPO fixed sentiments (GPT2): https://wandb.ai/pvduy/trlx/runs/2cyo46k4

Dahoas

I see you're adding a bunch of new tasks in your pr (which is great!) but they should probably be separated out into other prs if possible
Do you have a wandb you can share?
I would suggest not freezing anything first (on a very small model with a single gpu) to make sure the algo is right

Dahoas · 2022-12-21T21:13:25Z

examples/ds_config_trlx_neoj.json

@@ -0,0 +1,22 @@
+{


You'll probably want to put this under configs when finished

Dahoas · 2022-12-21T21:13:33Z

examples/ds_config_trlx_t5.json

@@ -0,0 +1,22 @@
+{


Same here (move under configs)

examples/reward_model.py

Dahoas · 2022-12-21T21:18:41Z

examples/summarize_dataset.py

@@ -0,0 +1,110 @@
+import torch


Eventually we'll want to put this dataset onto huggingface

(Possibly) Relevant: https://huggingface.co/datasets/openai/summarize_from_feedback

Actually, I used this dataset before they were public to hf, but that was for the RLHF blog post not for this PR.

PhungVanDuy · 2023-01-02T02:41:40Z

@Dahoas @LouisCastricato , this is an example FlanT5 for the CNN-Dailymail dataset but other charts quite weird. Please check when you have time. https://wandb.ai/pvduy/trlx/runs/8q3skf8p

jon-tow

Looks awesome! I've left some feedback to be addressed. Very excited about this 👍

trlx/data/configs.py

trlx/trainer/accelerate_base_trainer.py

trlx/utils/modeling.py

trlx/orchestrator/ppo_orchestrator.py

trlx/trainer/nn/ppo_models.py

trlx/trainer/accelerate_base_trainer.py

trlx/pipeline/offline_pipeline.py

examples/ppo_config_cnn_daily.yml

trlx/orchestrator/ppo_orchestrator.py

PhungVanDuy · 2023-01-03T11:50:00Z

Looks awesome! I've left some feedback to be addressed. Very excited about this 👍

Thank you for your great comment, I will follow up and fix that.

PhungVanDuy · 2023-01-07T16:29:16Z

Fixed PPO for T5 (https://wandb.ai/pvduy/trlx/runs/1n31fb6a). The fix for GPT-J still running on the OpenAI summarization dataset to check. Please review this @reciprocated @LouisCastricato @Dahoas

examples/t5_sentiment_train_lm.py

trlx/trainer/accelerate_base_trainer.py

examples/ppo_sentiments.py

LouisCastricato · 2023-01-07T16:40:23Z

examples/trlx_t5_summ_daily_cnn.py

+
+meteor = evaluate.load("meteor")
+
+if __name__ == "__main__":


This this is an example we probably should have lots of comments.

LouisCastricato · 2023-01-07T16:41:32Z

trlx/orchestrator/ppo_orchestrator.py

@@ -40,6 +40,7 @@ def __init__(

        if not hasattr(self.trainer.model, "frozen_head"):
            self.ref_model = self.trainer.get_arch(self.trainer.config)
+            self.ref_model.to(self.trainer.accelerator.device)


Have we verified this works? I recall accelerate freezing up if I started putting multiple models on gpu(though this could've just been the sentiment pipeline we were using for sentiments task)

@Dahoas yes that works.
e.g. https://wandb.ai/pvduy/trlx/runs/2wgpt4im

I do agree with the larger model we should distribute multiple models on multiple gpus, but for this, I think we should keep it on GPU rather than CPU, they are super slow.

trlx/utils/modeling.py

PhungVanDuy · 2023-01-08T22:32:39Z

@PhungVanDuy If the best-of-n sampling is not required for T5 support, I think it's best to create a separate PR for it to allow for proper testing (I found some basic issues in the previous commits). Then we could better review that without further complicating this PR.

You are right @jon-tow , I am removing from this PR we can consider merging this PR today, some bugs with the current main branch should be fixed by this PR. cc @LouisCastricato

jon-tow

Leaving some tiny final comments and change requests 🙏

trlx/data/configs.py

trlx/utils/modeling.py

trlx/trainer/nn/ppo_models.py

maxreciprocate · 2023-01-09T09:20:24Z

trlx/orchestrator/ppo_orchestrator.py

-                rs[-1] = scores[ix]
+                rs = rewards[ix]
+                if len(rs) == 0:
+                    rs = torch.tensor([0.0])


Should we penalize empty responses? Also do you know how it's possible to have those, except for when max_new_tokens == 0 🤔

Yes, this is a complete exception case, but I got it a few times when I ran PPO sentiments. @jon-tow you also faced with this right?

Yeah Jon has said that he also experienced it. I wonder if the case of it is unknown it may be a symptom of some other bug elsewhere

@reciprocated the only case I can think of that'd lead to an empty response is when len(query) is larger than the generate method's min_length arg, which defaults to 10, and the model so happens to output the eos_token on its first sample. (Note that with causal models the min_length constraint includes the length of the context (query) meaning it won't actually have an effect on the generations if the min condition is already met by the context size).

In such cases, I'm okay with penalizing empty responses as they're uninformative - so long as this is not a bug lol

I do not think it is a serious issue. We can just throw an error if this min_length thing comes up. I've never seen this in practice when I set min length correctly. (Perhaps we should add an extra parameter called min_new_length...? We should upstream to HF transformers though)

jon-tow

Leaving some notes for myself to address in the future.

jon-tow · 2023-01-10T07:39:01Z

trlx/orchestrator/ppo_orchestrator.py

-            all_tokens, attention_mask, position_ids = self.trainer.get_model_inputs(
-                query_tensors.to(response_tensors.device), response_tensors
-            )


This removed the need for:

trlx/trlx/trainer/accelerate_ppo_trainer.py

Line 89 in f71401e

def get_model_inputs(

We need to remove it if unused before it becomes stale.

jon-tow · 2023-01-10T07:41:09Z

trlx/orchestrator/ppo_orchestrator.py

+                logprobs = logprobs.cpu()
+                ref_logprobs = ref_logprobs.cpu()
+                query_tensors = query_tensors.cpu()
+                response_tensors = response_tensors.cpu()


Remove these lines - these vars are already put on cpu on the lines right before the if-statement

Duy Phung and others added 4 commits December 20, 2022 15:28

add t5 to trlx

ec00c54

add t5 examples for sentiment

dacb652

add eval for t5

56a0a3c

fix eval

5feff9f

PhungVanDuy changed the title ~~Add T5 model~~ [WIP] Add T5 model Dec 21, 2022

PhungVanDuy marked this pull request as draft December 21, 2022 02:06

Duy Phung and others added 6 commits December 21, 2022 02:08

remove old files

ccfabde

remove bad files

2674d24

remove bad files

6e43ea1

fix incompatible with gpt model, add summarization code base

59c2cf5

freeze frozen branch

2c133b0

Merge branch 'main' into add_t5

c9ddfcf

LouisCastricato mentioned this pull request Dec 21, 2022

Add google's T5 flan to models #132

Closed

Dahoas reviewed Dec 21, 2022

View reviewed changes

Duy Phung and others added 4 commits December 25, 2022 20:35

fix evaluation bug t5, add summarization cnn/daily mail example

5f38a81

update sentiment example

17be682

stable config sentiment

2d1a4dc

add attention mask decoder

f9f85ba

LouisCastricato mentioned this pull request Dec 31, 2022

Release 0.4.0 #148

Closed

Duy Phung added 3 commits December 31, 2022 22:03

setting worked - flant5 two unfrozen small rollouts

500099f

merge newest code from main

b55a4e8

fix head nn, config cnn daily mail, remove sent examples

36a74e6

PhungVanDuy marked this pull request as ready for review January 2, 2023 02:34

PhungVanDuy changed the title ~~[WIP] Add T5 model~~ Add T5 model Jan 2, 2023

PhungVanDuy marked this pull request as draft January 2, 2023 03:43

PhungVanDuy marked this pull request as ready for review January 2, 2023 03:55

jon-tow added this to the v0.4.0 milestone Jan 2, 2023

jon-tow requested changes Jan 3, 2023

View reviewed changes

Duy Phung and others added 6 commits January 6, 2023 08:30

fix style, change model_arch_type, truncated tokenizer fixed

6baee0b

fix style

d2082a7

precommit changes

d2f6a1d

fix ppo state values for t5

eaf9c94

Merge branch 'main' into add_t5

c03313a

fix style

93cf3cc

LouisCastricato reviewed Jan 7, 2023

View reviewed changes

Duy Phung and others added 4 commits January 7, 2023 16:55

remove sentiment example

8ac399b

fix typo

fefa62b

fix ppo for causal models, add save best, seperate rollouts/eval args

5ae1188

add ppo sentiment

ea10837

LouisCastricato reviewed Jan 7, 2023

View reviewed changes

trlx/utils/modeling.py Outdated Show resolved Hide resolved

Duy Phung and others added 2 commits January 8, 2023 22:13

fix rewards typo

84f8b7b

Merge branch 'main' into add_t5

03cc954

Duy Phung added 2 commits January 8, 2023 23:01

merging with main

347e314

fix style

220c8f3

jon-tow requested changes Jan 9, 2023

View reviewed changes

trlx/data/configs.py Show resolved Hide resolved

trlx/utils/modeling.py Outdated Show resolved Hide resolved

trlx/utils/modeling.py Outdated Show resolved Hide resolved

trlx/trainer/nn/ppo_models.py Show resolved Hide resolved

jon-tow reviewed Jan 9, 2023

View reviewed changes

trlx/trainer/nn/ppo_models.py Outdated Show resolved Hide resolved

add docstring for gen_kwargs_inference, save best

a0a43f8

maxreciprocate reviewed Jan 9, 2023

View reviewed changes

Duy Phung and others added 2 commits January 9, 2023 19:07

add gen kwargs support for rollouts sampling

b86e3d4

Make summarization example self-contained

eb0b0cc

jon-tow approved these changes Jan 9, 2023

View reviewed changes

LouisCastricato merged commit 0c5246f into CarperAI:main Jan 9, 2023

jon-tow mentioned this pull request Jan 9, 2023

Add tokenizer truncation side option #177

Closed

jon-tow reviewed Jan 10, 2023

View reviewed changes

aaronrmm mentioned this pull request Jan 16, 2023

Update Readme to include T5 models #198

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add T5 model #145

Add T5 model #145

PhungVanDuy commented Dec 21, 2022 •

edited

Loading

Dahoas left a comment

Dahoas Dec 21, 2022

Dahoas Dec 21, 2022

Dahoas Dec 21, 2022

jon-tow Jan 7, 2023

PhungVanDuy Jan 7, 2023

PhungVanDuy commented Jan 2, 2023 •

edited

Loading

jon-tow left a comment

PhungVanDuy commented Jan 3, 2023

PhungVanDuy commented Jan 7, 2023 •

edited

Loading

LouisCastricato Jan 7, 2023

LouisCastricato Jan 7, 2023

Dahoas Jan 9, 2023

PhungVanDuy Jan 9, 2023

PhungVanDuy Jan 9, 2023 •

edited

Loading

PhungVanDuy commented Jan 8, 2023

jon-tow left a comment

maxreciprocate Jan 9, 2023 •

edited

Loading

PhungVanDuy Jan 9, 2023 •

edited

Loading

maxreciprocate Jan 9, 2023

jon-tow Jan 9, 2023 •

edited

Loading

LouisCastricato Jan 9, 2023

jon-tow left a comment

jon-tow Jan 10, 2023

jon-tow Jan 10, 2023

Add T5 model #145

Add T5 model #145

Conversation

PhungVanDuy commented Dec 21, 2022 • edited Loading

Dahoas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PhungVanDuy commented Jan 2, 2023 • edited Loading

jon-tow left a comment

Choose a reason for hiding this comment

PhungVanDuy commented Jan 3, 2023

PhungVanDuy commented Jan 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PhungVanDuy Jan 9, 2023 • edited Loading

Choose a reason for hiding this comment

PhungVanDuy commented Jan 8, 2023

jon-tow left a comment

Choose a reason for hiding this comment

maxreciprocate Jan 9, 2023 • edited Loading

Choose a reason for hiding this comment

PhungVanDuy Jan 9, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jon-tow Jan 9, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jon-tow left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PhungVanDuy commented Dec 21, 2022 •

edited

Loading

PhungVanDuy commented Jan 2, 2023 •

edited

Loading

PhungVanDuy commented Jan 7, 2023 •

edited

Loading

PhungVanDuy Jan 9, 2023 •

edited

Loading

maxreciprocate Jan 9, 2023 •

edited

Loading

PhungVanDuy Jan 9, 2023 •

edited

Loading

jon-tow Jan 9, 2023 •

edited

Loading