[feat] Support tying metadata to each prompt #421

maxreciprocate · 2023-04-05T14:18:28Z

This PR lets users pass an optional dictionary alongside each prompt into PromptPipeline, which would get passed later, together with generated completions, into a reward function in a form of additional keywords per each key in the original dictionary.

Fix of #297, #301, #340

https://wandb.ai/sorry/trlx-references/reports/add-prompts-metadata-v-main--Vmlldzo0MDAyMDIy

jon-tow

Looks great, Max! I'm definitely going to steal your gather_dict util for personal code 😄
I've left two comments for feedback whenever you get the chance 🙏

jon-tow · 2023-04-07T22:55:25Z

trlx/trainer/accelerate_ppo_trainer.py

@@ -282,7 +282,7 @@ def make_experience(self, num_rollouts: int = 1024, iter_count: int = 0):  # noq
            exp_generate_time = time()

            # Generate samples from the language model (similar to using HuggingFace `generate` method)
-            samples = self.generate(**batch)
+            samples = self.generate(batch["input_ids"], batch["attention_mask"])


Aside: This change makes it clear that next(self.prompt_iterator) doesn't materialize a PromptBatch instance; in fact, PromptBatch objects are never created anywhere. Sigh.

Why don't we depreciate prompt batch then.

jon-tow · 2023-04-07T22:58:17Z

examples/hh/ppo_hh.py

+            original_samples = [p + o + reward_tokenizer.eos_token for p, o in zip(prompts, original_output)]
+            original_rewards = get_reward(original_samples)


Aside: Nice

jon-tow · 2023-04-07T23:00:11Z

trlx/pipeline/offline_pipeline.py

@@ -125,7 +134,15 @@ def __len__(self) -> int:
        return len(self.prompts)

    def create_loader(self, batch_size: int, shuffle=False) -> DataLoader:
-        collate_fn = DataCollatorWithPadding(self.tokenizer) if self.tokenizer else torch.vstack
+        def collate_fn(xs):
+            out = self.tokenizer.pad([{"input_ids": x["input_ids"]} for x in xs], return_tensors="pt")


Is it safe to assume self.tokenizer will always be available? Not sure if the previous check if self.tokenizer else torch.vstack is just stale code?

It was a stale code and the single example which didn't use tokenizer at the time (ilql_randomwalks) was also converted to use tokenizer to drop these checks, out of which it is the last, for consistency; right now tokenizer is required in almost every other piece of code, so I'm not sure if that feature is missed.

jon-tow · 2023-04-07T23:20:16Z

trlx/pipeline/offline_pipeline.py

        super().__init__()

+        if isinstance(prompts[0], dict):
+            metadata = prompts
+            prompts = [x.pop("prompt") for x in metadata]


In PromptPipeline's constructor docstring, we should document that "prompt" is an expected field of metadata when prompts consists of dictionaries.

jon-tow

Great work, Max! 👏

maxreciprocate added 9 commits April 5, 2023 17:06

feat(offline_pipeline): support tying metadata to each prompt

1056f2a

style: satisfy flake

05642b8

fix(offline_pipeline): downgrade to python3.8

2ae9a3a

feat(examples/hh): use delta reward against dataset's chosen outputs

2c0f7a3

style: satisfy black

196e5c0

Merge branch 'main' into add-prompts-metadata

5818d05

fix(offline_pipeline): adapt metadata for minibatching

e05b056

refactor(utils/modeling): rename to gather_dict

912de83

style: satisfy flake

8f11cca

maxreciprocate marked this pull request as ready for review April 6, 2023 21:15

maxreciprocate requested a review from jon-tow April 6, 2023 21:17

jon-tow requested changes Apr 7, 2023

View reviewed changes

maxreciprocate added 4 commits April 10, 2023 17:28

docs(offline_pipeline): update to the latest changes

6b2b59f

docs(trlx/trlx): update prompts' signature

ba8f63f

style

4de3808

style: satisfy black

80f7942

jon-tow approved these changes Apr 10, 2023

View reviewed changes

jon-tow merged commit a66a7da into main Apr 10, 2023

maxreciprocate mentioned this pull request Sep 1, 2023

Pass extra information for the reward function with every sample. #301

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] Support tying metadata to each prompt #421

[feat] Support tying metadata to each prompt #421

maxreciprocate commented Apr 5, 2023 •

edited

Loading

jon-tow left a comment

jon-tow Apr 7, 2023

LouisCastricato Apr 8, 2023

jon-tow Apr 7, 2023

jon-tow Apr 7, 2023

maxreciprocate Apr 10, 2023

jon-tow Apr 7, 2023

jon-tow left a comment

		original_samples = [p + o + reward_tokenizer.eos_token for p, o in zip(prompts, original_output)]
		original_rewards = get_reward(original_samples)

[feat] Support tying metadata to each prompt #421

[feat] Support tying metadata to each prompt #421

Conversation

maxreciprocate commented Apr 5, 2023 • edited Loading

jon-tow left a comment

Choose a reason for hiding this comment

jon-tow Apr 7, 2023

Choose a reason for hiding this comment

LouisCastricato Apr 8, 2023

Choose a reason for hiding this comment

jon-tow Apr 7, 2023

Choose a reason for hiding this comment

jon-tow Apr 7, 2023

Choose a reason for hiding this comment

maxreciprocate Apr 10, 2023

Choose a reason for hiding this comment

jon-tow Apr 7, 2023

Choose a reason for hiding this comment

jon-tow left a comment

Choose a reason for hiding this comment

maxreciprocate commented Apr 5, 2023 •

edited

Loading