add PPO and stack_llama support by sywangyi · Pull Request #615 · huggingface/optimum-habana

sywangyi · 2023-12-28T06:32:09Z

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

sywangyi · 2023-12-28T06:33:31Z

depend on #612 and #507

sywangyi · 2023-12-28T06:33:53Z

@libinta

HuggingFaceDocBuilderDev · 2023-12-28T06:37:32Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

libinta

can you also consider to add test case in test folder for the stack_llama/stack_llama_2 for better regression detection? You can have a short run for perf/acc check.

sywangyi · 2024-01-11T00:02:24Z

@mandy-li , finetune reward model (step 2) and then apply RLHF to the actor model and reward model (step 3) is all here

regisss · 2024-01-16T14:23:39Z

@sywangyi Could you remove the stack_llama folder and move everything in examples/trl similarly to #635 please?

sywangyi · 2024-01-17T09:49:55Z

yes, I will refactor the PR

sywangyi · 2024-01-18T04:38:32Z

https://habana.atlassian.net/servicedesk/customer/portal/1/HS-1425 is filed for PPO training graph. @libinta

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

sywangyi · 2024-01-31T03:07:51Z

Thanks for the detailed review @regisss , update PR based on the review comments.

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

regisss · 2024-02-07T14:18:52Z

@sywangyi Can you add evaluate and scikit-learn to requirements.txt? It is needed to run reward_modeling.py.

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

regisss

LGTM!

sywangyi requested a review from regisss as a code owner December 28, 2023 06:32

sywangyi force-pushed the PPO_stack_llama branch 2 times, most recently from 5be4c86 to e3840e0 Compare December 28, 2023 08:11

libinta reviewed Jan 5, 2024

View reviewed changes

sywangyi force-pushed the PPO_stack_llama branch 2 times, most recently from 5b5e886 to b7ce2a5 Compare January 18, 2024 04:37