add PPO and stack_llama support#615
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
5be4c86 to
e3840e0
Compare
libinta
left a comment
There was a problem hiding this comment.
can you also consider to add test case in test folder for the stack_llama/stack_llama_2 for better regression detection? You can have a short run for perf/acc check.
|
@mandy-li , finetune reward model (step 2) and then apply RLHF to the actor model and reward model (step 3) is all here |
|
yes, I will refactor the PR |
5b5e886 to
b7ce2a5
Compare
|
https://habana.atlassian.net/servicedesk/customer/portal/1/HS-1425 is filed for PPO training graph. @libinta |
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
b7ce2a5 to
e7f83d9
Compare
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
|
Thanks for the detailed review @regisss , update PR based on the review comments. |
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
|
@sywangyi Can you add |
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
What does this PR do?
Fixes # (issue)
Before submitting