-
Notifications
You must be signed in to change notification settings - Fork 370
Adding support for Arabic benchmarks : AceGPT benchmarking suite #44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
LGTM, we'll just merge #47 first, merge main into your branch, then I'll test this PR one last time, and we'll be good to go! |
first attempt to create the OALL_tasks.txt file, need to be populated later with all the benchmarks
Add AceGPT benchmarking suite (arabic_mmlu, arabic_exams & acva)
add "xstory_cloze:ar" to the OALL tasks as well
Add the AceGPT benchmarking suite (arabic_mmlu, arabic_exams & acva)
update prompt function for arabic_mmlu and arabic_exams
Add `mmlu_harness_arabic` and `exams_harness_arabic` to support the AceGPT benchmarking suite added to the tasks_tables.jsonl file (main/src/lighteval/tasks/tasks_table.jsonl)
Add the acva() prompting function for the ACVA benchmark from the AceGPT benchmarking suite
a test file in order to test in the script works as expected !
forgot ".txt" in the last commit :)
Update a typo in the metric from `loglikelihood_acc_single_token_single_token` to `loglikelihood_acc_single_token` in the following lines : "acva:Algeria" "acva:Ancient_Egypt" "acva:Arab_Empire" "acva:Arabic_Architecture" "acva:Arabic_Art"
Add `LETTER_INDICES_AR` List and update `mmlu_harness_arabic` and `exams_harness_arabic` to match the new changes
Update a typo in `arabic_exams` in line 1203 from : "hf_avail_splits":["test","dev"],"evaluation_splits":["test"],"few_shots_split":"dev" to : "hf_avail_splits":["test","validation"],"evaluation_splits":["test"],"few_shots_split":"validation"
temporary deleting "lighteval|acva:entertainment|5|1"
Add back "lighteval|acva:entertainment|5|1"
Update metric for acva benchmark From : loglikelihood_acc_single_token To : loglikelihood_acc
…king suite + Apply fixes from pre-commit hooks
Change `lighteval` suite to `community` for arabic benchmarks
revert previous commits
revert previous commits
@clefourrier is everything fine ? how the final test went ? any issues i can resolve from my side ? |
|
Hi @alielfilali01 , everything is fine! |
|
Looks good ! Great work implementing this. Can you run the evals on your side and make sure you get the expected results ? |
Fix typo `mmlu` to `arabic_mmlu` : line 55 Co-authored-by: Nathan Habib <[email protected]>
|
@alielfilali01 Do you need help with the code style? And do you have reference scores for some OSS models that we could test? |
Thanks for your attention. I didn't notice before, but now i've made the necessary changes and formatting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, let's merge if tests pass
Adds - `mmlu_harness_arabic` - `exams_harness_arabic` - `acva` as custom tasks. --------- Co-authored-by: Nathan Habib <[email protected]>
Adds - `mmlu_harness_arabic` - `exams_harness_arabic` - `acva` as custom tasks. --------- Co-authored-by: Nathan Habib <[email protected]>
This PR is meant to add Arabic benchmarks to the core library so they can be used out of the box, mainly here i'am adding the AceGPT benchmarking suite which consist of 3 main datasets
arabic_mmlu(57 subset),arabic_examsandacva(58 subset)Both
arabic_mmluandarabic_examsare translated by the AceGPT team and manually checked as they claim butacvais a native arabic benchmark contributed by the AceGPT team.cc : @clefourrier