-
Notifications
You must be signed in to change notification settings - Fork 472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix context overflow #131
Fix context overflow #131
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested locally and looks great! I left a few comments and questions 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Thanks for clearing up my questions.
Note: Awaiting logs before merging.
When are we merging |
@@ -162,10 +163,12 @@ def loss( | |||
old_values - self.cliprange_value, | |||
old_values + self.cliprange_value, | |||
) | |||
n = mask.sum() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure but shouldn't n be vector valued where each component is the size of the ith generation? (Before endoftext)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is used only in reductions to scalars so no
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left one tiny nit if you could take a look!
Fix of #130, this PR
max_new_tokens
seq_length - max_new_tokens
PPO
ILQL