Replies: 2 comments
-
Yep! We have an example we'll be merging soon where we got openai's learning to summarize reward model working with TRLX on a 20b language model. We also have a very minimal version of CodeRL working, it's included as an example here. We've also been discussing TRLX with plenty of RLHF industry folks and have gotten a few seals of approval at this point. |
Beta Was this translation helpful? Give feedback.
0 replies
-
What's the largest PPO model size that has been trained and tested with TRLX? Can you share some performance metrics, i.e. GPU count, training time? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Has this been tested on anything?
Beta Was this translation helpful? Give feedback.
All reactions