Skip to content

Conversation

@alielfilali01
Copy link
Contributor

This PR is meant to add Arabic benchmarks to the core library so they can be used out of the box, mainly here i'am adding the AceGPT benchmarking suite which consist of 3 main datasets arabic_mmlu (57 subset), arabic_exams and acva (58 subset)

Both arabic_mmlu and arabic_exams are translated by the AceGPT team and manually checked as they claim but acva is a native arabic benchmark contributed by the AceGPT team.

cc : @clefourrier

@alielfilali01 alielfilali01 changed the title Adding support to Arabic benchmarks Adding support for Arabic benchmarks Feb 20, 2024
@clefourrier clefourrier self-assigned this Feb 22, 2024
@NathanHB NathanHB self-assigned this Feb 22, 2024
@clefourrier
Copy link
Member

clefourrier commented Feb 22, 2024

LGTM, we'll just merge #47 first, merge main into your branch, then I'll test this PR one last time, and we'll be good to go!
You'll be our first external code contribution! 🔥

@alielfilali01 alielfilali01 changed the title Adding support for Arabic benchmarks Adding support for Arabic benchmarks : AceGPT benchmarking suite Feb 22, 2024
alielfilali01 and others added 20 commits February 22, 2024 18:40
first attempt to create the OALL_tasks.txt file, need to be populated later with all the benchmarks
Add AceGPT benchmarking suite (arabic_mmlu, arabic_exams & acva)
add "xstory_cloze:ar" to the OALL tasks as well
Add the AceGPT benchmarking suite (arabic_mmlu, arabic_exams & acva)
update prompt function for arabic_mmlu and arabic_exams
Add `mmlu_harness_arabic` and `exams_harness_arabic` to support the AceGPT benchmarking suite added to the tasks_tables.jsonl file (main/src/lighteval/tasks/tasks_table.jsonl)
Add the acva() prompting function for the ACVA benchmark from the AceGPT benchmarking suite
a test file in order to test in the script works as expected !
forgot ".txt" in the last commit :)
Update a typo in the metric from `loglikelihood_acc_single_token_single_token` to `loglikelihood_acc_single_token` in the following lines :
"acva:Algeria"
"acva:Ancient_Egypt"
"acva:Arab_Empire"
"acva:Arabic_Architecture"
"acva:Arabic_Art"
Add `LETTER_INDICES_AR` List and update `mmlu_harness_arabic` and `exams_harness_arabic` to match the new changes
Update a typo in `arabic_exams` in line 1203

from :
"hf_avail_splits":["test","dev"],"evaluation_splits":["test"],"few_shots_split":"dev"

to :
"hf_avail_splits":["test","validation"],"evaluation_splits":["test"],"few_shots_split":"validation"
temporary deleting "lighteval|acva:entertainment|5|1"
Add back "lighteval|acva:entertainment|5|1"
Update metric for acva benchmark 

From :
loglikelihood_acc_single_token

To : 
loglikelihood_acc
…king suite + Apply fixes from pre-commit hooks
Change `lighteval` suite to `community` for arabic benchmarks
revert previous commits
revert previous commits
@alielfilali01
Copy link
Contributor Author

LGTM, we'll just merge #47 first, merge main into your branch, then I'll test this PR one last time, and we'll be good to go! You'll be our first external code contribution! 🔥

@clefourrier is everything fine ? how the final test went ? any issues i can resolve from my side ?

@clefourrier
Copy link
Member

Hi @alielfilali01 , everything is fine!
Just waiting for my co-maintainer's approval on the other PR ^^

@NathanHB
Copy link
Member

Looks good ! Great work implementing this. Can you run the evals on your side and make sure you get the expected results ?

Fix typo `mmlu` to `arabic_mmlu` : line 55

Co-authored-by: Nathan Habib <[email protected]>
@clefourrier
Copy link
Member

clefourrier commented Feb 26, 2024

@alielfilali01 Do you need help with the code style? And do you have reference scores for some OSS models that we could test?

@alielfilali01
Copy link
Contributor Author

@alielfilali01 Do you need help with the code style? And do you have reference scores for some OSS models that we could test?

Thanks for your attention. I didn't notice before, but now i've made the necessary changes and formatting.

Copy link
Member

@clefourrier clefourrier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, let's merge if tests pass

@clefourrier clefourrier merged commit 090101f into huggingface:main Feb 26, 2024
hynky1999 pushed a commit that referenced this pull request May 22, 2025
Adds 
- `mmlu_harness_arabic`
- `exams_harness_arabic`
- `acva`

as custom tasks.

---------

Co-authored-by: Nathan Habib <[email protected]>
NathanHB added a commit that referenced this pull request Sep 19, 2025
Adds 
- `mmlu_harness_arabic`
- `exams_harness_arabic`
- `acva`

as custom tasks.

---------

Co-authored-by: Nathan Habib <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants