Skip to content

add PPO and stack_llama support#615

Merged
regisss merged 6 commits into
mainfrom
PPO_stack_llama
Feb 11, 2024
Merged

add PPO and stack_llama support#615
regisss merged 6 commits into
mainfrom
PPO_stack_llama

Conversation

@sywangyi
Copy link
Copy Markdown
Collaborator

What does this PR do?

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@sywangyi sywangyi requested a review from regisss as a code owner December 28, 2023 06:32
@sywangyi
Copy link
Copy Markdown
Collaborator Author

depend on #612 and #507

@sywangyi
Copy link
Copy Markdown
Collaborator Author

@libinta

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@sywangyi sywangyi force-pushed the PPO_stack_llama branch 2 times, most recently from 5be4c86 to e3840e0 Compare December 28, 2023 08:11
Copy link
Copy Markdown
Collaborator

@libinta libinta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also consider to add test case in test folder for the stack_llama/stack_llama_2 for better regression detection? You can have a short run for perf/acc check.

Comment thread examples/trl/stack_llama/README.md Outdated
Comment thread examples/trl/stack_llama/reward_modeling.py Outdated
Comment thread examples/trl/stack_llama/reward_modeling.py Outdated
Comment thread examples/trl/stack_llama/reward_modeling.py Outdated
Comment thread examples/trl/stack_llama/reward_modeling.py
Comment thread examples/trl/stack_llama/rl_training.py
Comment thread examples/trl/stack_llama/rl_training.py
@sywangyi
Copy link
Copy Markdown
Collaborator Author

@mandy-li , finetune reward model (step 2) and then apply RLHF to the actor model and reward model (step 3) is all here

@regisss
Copy link
Copy Markdown
Collaborator

regisss commented Jan 16, 2024

@sywangyi Could you remove the stack_llama folder and move everything in examples/trl similarly to #635 please?

@sywangyi
Copy link
Copy Markdown
Collaborator Author

yes, I will refactor the PR

@sywangyi sywangyi force-pushed the PPO_stack_llama branch 2 times, most recently from 5b5e886 to b7ce2a5 Compare January 18, 2024 04:37
@sywangyi
Copy link
Copy Markdown
Collaborator Author

https://habana.atlassian.net/servicedesk/customer/portal/1/HS-1425 is filed for PPO training graph. @libinta

Comment thread examples/trl/README.md
Comment thread examples/trl/README.md Outdated
Comment thread optimum/habana/trl/trainer/ppo_trainer.py Outdated
Comment thread examples/trl/README.md
Comment thread examples/trl/README.md
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Comment thread optimum/habana/trl/trainer/ppo_config.py Outdated
Comment thread optimum/habana/trl/trainer/ppo_config.py Outdated
Comment thread optimum/habana/trl/trainer/ppo_config.py
Comment thread optimum/habana/trl/trainer/ppo_config.py Outdated
Comment thread optimum/habana/trl/trainer/ppo_trainer.py Outdated
Comment thread examples/trl/ppo.py
Comment thread examples/trl/ppo.py Outdated
Comment thread examples/trl/reward_modeling.py Outdated
Comment thread examples/trl/reward_modeling.py Outdated
Comment thread examples/trl/reward_modeling.py Outdated
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
@sywangyi
Copy link
Copy Markdown
Collaborator Author

Thanks for the detailed review @regisss , update PR based on the review comments.

Comment thread optimum/habana/trl/models/modeling_base.py
Comment thread optimum/habana/trl/trainer/ppo_config.py
Comment thread optimum/habana/trl/trainer/ppo_trainer.py Outdated
Comment thread optimum/habana/trl/trainer/reward_trainer.py
Comment thread optimum/habana/trl/trainer/reward_trainer.py
Comment thread optimum/habana/trl/trainer/ppo_trainer.py Outdated
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
@regisss
Copy link
Copy Markdown
Collaborator

regisss commented Feb 7, 2024

@sywangyi Can you add evaluate and scikit-learn to requirements.txt? It is needed to run reward_modeling.py.

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Comment thread examples/trl/README.md Outdated
Copy link
Copy Markdown
Collaborator

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@regisss regisss merged commit 2a07c5d into main Feb 11, 2024
@regisss regisss deleted the PPO_stack_llama branch February 11, 2024 05:05
jychen21 pushed a commit to jychen21/optimum-habana that referenced this pull request Feb 27, 2024
HolyFalafel pushed a commit to HabanaAI/optimum-habana-fork that referenced this pull request Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants