Can this library be used for encoder only models? #464
-
I would like to use tune an encoder-only model (like T5) via RLHF and use its produced embeddings to calculate the reward. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
Hi! Unfortunately no, this library, at its current state, operates solely on discrete token level rewards. Although I recall that @shahbuland was doing something interesting with reinforcement on continuous action spaces, so he might give some pointers here |
Beta Was this translation helpful? Give feedback.
-
Thank you! I'll contact him |
Beta Was this translation helpful? Give feedback.
Hi! Unfortunately no, this library, at its current state, operates solely on discrete token level rewards. Although I recall that @shahbuland was doing something interesting with reinforcement on continuous action spaces, so he might give some pointers here