-
Notifications
You must be signed in to change notification settings - Fork 361
Add maj@k metric #158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add maj@k metric #158
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ! Do we have metric for gsm8k to compare with other implementations ?
I'll check it - however, please don't merge yet, I need to propagate the changes to the other lauchers + simplify greedy :) |
…ls in one single step
Oh mb I thought it ready for review ! |
I hope I'll finish it today :) |
Tests failing because of the hub problems ^^""" |
e45f395
to
4ef4c99
Compare
Co-authored-by: Nathan Habib <[email protected]>
Is this good to be merged ? I saw on slack you did not get the same results for mistral models. (do we even know how they ran their tests ?) |
I'll do more tests today - we could merge it now and adjust later if needed though. |
Alright i'm merging it so that we can merge the other PRs |
This PR needs #158 to be merged first. The main problem is that splits are no longer by size, but they are however now making sure that all batched generative evals are launched with similar evals (same sampling, same eos tokens) --------- Co-authored-by: Nathan Habib <[email protected]>
Co-authored-by: Nathan Habib <[email protected]> * added review change --------- Co-authored-by: Nathan Habib <[email protected]>
This PR needs #158 to be merged first. The main problem is that splits are no longer by size, but they are however now making sure that all batched generative evals are launched with similar evals (same sampling, same eos tokens) --------- Co-authored-by: Nathan Habib <[email protected]>
Co-authored-by: Nathan Habib <[email protected]> * added review change --------- Co-authored-by: Nathan Habib <[email protected]>
This PR needs #158 to be merged first. The main problem is that splits are no longer by size, but they are however now making sure that all batched generative evals are launched with similar evals (same sampling, same eos tokens) --------- Co-authored-by: Nathan Habib <[email protected]>
Still missing: