refactor: remove orchestrator abstraction from API #289

jon-tow · 2023-02-08T03:32:25Z

This PR removes all of the orchestrator components for reasons outlined in #278.

Highlights for reviewer(s):

Adds a new method to AcceleratePPOTrainer called add_prompt_pipeline that mimics the prompt pipeline loading and device placement of the removed PPO orchestrator. This is sort of awkward because it requires users to manually call the method before running make_experience (the same thing you have to do with add_eval_pipeline). Open to suggestions; I'd prefer to pass the pipeline to the trainer constructor but it breaks the config-first approach currently implemented (should discuss for future refactoring?).
Removes dead code related to MagiCARP. Removes unused utils.topk_mask.

Reproduction reports:

…rchs

cat-state · 2023-02-08T23:06:56Z

Thanks jon! I think you might have to also change something for the nemo trainer?

cat-state

I think this mostly looks good except for the nemo, in future we can try to refactor and break up the big make_experience functions

cat-state · 2023-02-08T23:08:14Z

trlx/trlx.py

@@ -89,8 +89,12 @@ def train(  # noqa: C901
            eval_prompts = prompts[:batch_size]

        pipeline = get_pipeline(config.train.pipeline)(prompts, max_prompt_length, trainer.tokenizer)
-        orch = get_orchestrator(config.train.orchestrator)(trainer, pipeline, chunk_size=config.method.chunk_size)
-        orch.make_experience(config.method.num_rollouts)
+        trainer.add_prompt_pipeline(pipeline)


Re add_prompt_pipeline - yeah that is a bit awkward, it should be possible to pass it via args if you move the get_trainer call into the PPO part of the branch? But if its too messy then np

You could also probably replace the get_pipeline with PromptPipeline since all the models use the same pipeline

Yeah I wasn't sure if it'd be too messy - let me give it a go :)

I agree on replacing the get_pipeline stuff with just PromptPipeline; I originally did that but reverted before creating the PR to limit the scope to the issue being addressed. I can't think of any other prompt pipelines so not sure what the _DATAPIPELINE registry is even intended for?

so not sure what the _DATAPIPELINE registry is even intended for?

It's from the time when every pipeline was ought to be a specific dataset, each registered deliberately

cat-state · 2023-02-08T23:08:39Z

trlx/trainer/accelerate_ilql_trainer.py

@@ -95,3 +142,73 @@ def save_pretrained(self, directory: Optional[str] = None):
            "`AccelerateILQLTrainer` does not currently support automatic saving "
            "with `transformers.PreTrainedModel.save_pretrained`."
        )
+
+    def make_experience(self, samples, rewards, max_length=2048):


maybe pull this into somewhere it can be shared with nemo impl?
I guess this could mean its worth also passing in the rollout store as an arg like the promptpipeline for PPO

I've duplicated the make_experience into NeMo for now. There was a subtle difference in logging whereby NeMo couldn't recognize the global rank RANK == 0checks forcing each rank to write tables to stdout (the fix is to just use their global rank check util).

I think it might be best to push this off to another PR because we'll need to re-visit this abstraction again for PPO. What do you think?

Yeah, I think that makes sense for now. Maybe torch.distributed.get_rank() will work for both but we can revisit

cat-state

Thanks jon! I think this looks good to me, we can excise and clean up get_pipeline in future work and revisit the sharing between accelerate and nemo

jon-tow added 3 commits February 8, 2023 03:21

refactor: remove orchestrator abstraction from API

f8119c0

Merge branch 'main' of https://github.com/CarperAI/trlx into remove-o…

857411d

…rchs

Remove orchestrator in GPT-J config

151dd9b

jon-tow marked this pull request as ready for review February 8, 2023 07:33

cat-state requested changes Feb 8, 2023

View reviewed changes

jon-tow added 4 commits February 9, 2023 20:50

Add reward_fn arg to NeMo constructor to match base trainer API

6bc6d8b

Initial support for make_experience in NeMo ILQL

fc826d3

Run pre-commit

2061b18

Remove unused sampling util

89b2bcd

cat-state approved these changes Feb 10, 2023

View reviewed changes

jon-tow merged commit 81e935a into CarperAI:main Feb 10, 2023

jon-tow deleted the remove-orchs branch February 10, 2023 16:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: remove orchestrator abstraction from API #289

refactor: remove orchestrator abstraction from API #289

jon-tow commented Feb 8, 2023 •

edited

Loading

cat-state commented Feb 8, 2023

cat-state left a comment

cat-state Feb 8, 2023 •

edited

Loading

jon-tow Feb 9, 2023

maxreciprocate Feb 9, 2023 •

edited

Loading

cat-state Feb 8, 2023 •

edited

Loading

jon-tow Feb 9, 2023

cat-state Feb 10, 2023

cat-state left a comment

refactor: remove orchestrator abstraction from API #289

refactor: remove orchestrator abstraction from API #289

Conversation

jon-tow commented Feb 8, 2023 • edited Loading

cat-state commented Feb 8, 2023

cat-state left a comment

Choose a reason for hiding this comment

cat-state Feb 8, 2023 • edited Loading

Choose a reason for hiding this comment

jon-tow Feb 9, 2023

Choose a reason for hiding this comment

maxreciprocate Feb 9, 2023 • edited Loading

Choose a reason for hiding this comment

cat-state Feb 8, 2023 • edited Loading

Choose a reason for hiding this comment

jon-tow Feb 9, 2023

Choose a reason for hiding this comment

cat-state Feb 10, 2023

Choose a reason for hiding this comment

cat-state left a comment

Choose a reason for hiding this comment

jon-tow commented Feb 8, 2023 •

edited

Loading

cat-state Feb 8, 2023 •

edited

Loading

maxreciprocate Feb 9, 2023 •

edited

Loading

cat-state Feb 8, 2023 •

edited

Loading