Adding support for Arabic benchmarks : AceGPT benchmarking suite #44

alielfilali01 · 2024-02-20T12:58:01Z

This PR is meant to add Arabic benchmarks to the core library so they can be used out of the box, mainly here i'am adding the AceGPT benchmarking suite which consist of 3 main datasets arabic_mmlu (57 subset), arabic_exams and acva (58 subset)

Both arabic_mmlu and arabic_exams are translated by the AceGPT team and manually checked as they claim but acva is a native arabic benchmark contributed by the AceGPT team.

cc : @clefourrier

tasks_examples/OALL_tasks.txt

src/lighteval/tasks/tasks_prompt_formatting.py

tasks_examples/OALL_tasks_test.txt

clefourrier · 2024-02-22T14:40:21Z

LGTM, we'll just merge #47 first, merge main into your branch, then I'll test this PR one last time, and we'll be good to go!
You'll be our first external code contribution! 🔥

first attempt to create the OALL_tasks.txt file, need to be populated later with all the benchmarks

Add AceGPT benchmarking suite (arabic_mmlu, arabic_exams & acva)

add "xstory_cloze:ar" to the OALL tasks as well

Add the AceGPT benchmarking suite (arabic_mmlu, arabic_exams & acva)

update prompt function for arabic_mmlu and arabic_exams

Add `mmlu_harness_arabic` and `exams_harness_arabic` to support the AceGPT benchmarking suite added to the tasks_tables.jsonl file (main/src/lighteval/tasks/tasks_table.jsonl)

Add the acva() prompting function for the ACVA benchmark from the AceGPT benchmarking suite

a test file in order to test in the script works as expected !

forgot ".txt" in the last commit :)

Update a typo in the metric from `loglikelihood_acc_single_token_single_token` to `loglikelihood_acc_single_token` in the following lines : "acva:Algeria" "acva:Ancient_Egypt" "acva:Arab_Empire" "acva:Arabic_Architecture" "acva:Arabic_Art"

Add `LETTER_INDICES_AR` List and update `mmlu_harness_arabic` and `exams_harness_arabic` to match the new changes

Update a typo in `arabic_exams` in line 1203 from : "hf_avail_splits":["test","dev"],"evaluation_splits":["test"],"few_shots_split":"dev" to : "hf_avail_splits":["test","validation"],"evaluation_splits":["test"],"few_shots_split":"validation"

temporary deleting "lighteval|acva:entertainment|5|1"

Add back "lighteval|acva:entertainment|5|1"

Update metric for acva benchmark From : loglikelihood_acc_single_token To : loglikelihood_acc

…king suite + Apply fixes from pre-commit hooks

Change `lighteval` suite to `community` for arabic benchmarks

revert previous commits

alielfilali01 · 2024-02-23T13:10:02Z

LGTM, we'll just merge #47 first, merge main into your branch, then I'll test this PR one last time, and we'll be good to go! You'll be our first external code contribution! 🔥

@clefourrier is everything fine ? how the final test went ? any issues i can resolve from my side ?

clefourrier · 2024-02-23T13:20:00Z

Hi @alielfilali01 , everything is fine!
Just waiting for my co-maintainer's approval on the other PR ^^

community_tasks/arabic_evals.py

NathanHB · 2024-02-24T10:59:47Z

Looks good ! Great work implementing this. Can you run the evals on your side and make sure you get the expected results ?

Fix typo `mmlu` to `arabic_mmlu` : line 55 Co-authored-by: Nathan Habib <[email protected]>

clefourrier · 2024-02-26T08:27:26Z

@alielfilali01 Do you need help with the code style? And do you have reference scores for some OSS models that we could test?

alielfilali01 · 2024-02-26T12:45:16Z

@alielfilali01 Do you need help with the code style? And do you have reference scores for some OSS models that we could test?

Thanks for your attention. I didn't notice before, but now i've made the necessary changes and formatting.

clefourrier

LGTM, let's merge if tests pass

Adds - `mmlu_harness_arabic` - `exams_harness_arabic` - `acva` as custom tasks. --------- Co-authored-by: Nathan Habib <[email protected]>

alielfilali01 changed the title ~~Adding support to Arabic benchmarks~~ Adding support for Arabic benchmarks Feb 20, 2024

clefourrier self-assigned this Feb 22, 2024

NathanHB self-assigned this Feb 22, 2024

clefourrier reviewed Feb 22, 2024

View reviewed changes

tasks_examples/OALL_tasks.txt Outdated Show resolved Hide resolved

src/lighteval/tasks/tasks_prompt_formatting.py Show resolved Hide resolved

tasks_examples/OALL_tasks_test.txt Outdated Show resolved Hide resolved

alielfilali01 changed the title ~~Adding support for Arabic benchmarks~~ Adding support for Arabic benchmarks : AceGPT benchmarking suite Feb 22, 2024

alielfilali01 and others added 20 commits February 22, 2024 18:40

Create OALL_tasks.txt

9e1f585

first attempt to create the OALL_tasks.txt file, need to be populated later with all the benchmarks

Update OALL_tasks.txt

3f36afd

Add AceGPT benchmarking suite (arabic_mmlu, arabic_exams & acva)

Update OALL_tasks.txt

2486df8

add "xstory_cloze:ar" to the OALL tasks as well

Update tasks_table.jsonl

cc420f2

Add the AceGPT benchmarking suite (arabic_mmlu, arabic_exams & acva)

Update tasks_table.jsonl

9078d5f

update prompt function for arabic_mmlu and arabic_exams

Update tasks_prompt_formatting.py

b573b21

Add `mmlu_harness_arabic` and `exams_harness_arabic` to support the AceGPT benchmarking suite added to the tasks_tables.jsonl file (main/src/lighteval/tasks/tasks_table.jsonl)

Update tasks_prompt_formatting.py

d929b68

Add the acva() prompting function for the ACVA benchmark from the AceGPT benchmarking suite

Create OALL_tasks_test

73dd24c

a test file in order to test in the script works as expected !

Rename OALL_tasks_test to OALL_tasks_test.txt

ed835a6

forgot ".txt" in the last commit :)

Update tasks_table.jsonl

29903a2

Update a typo in the metric from `loglikelihood_acc_single_token_single_token` to `loglikelihood_acc_single_token` in the following lines : "acva:Algeria" "acva:Ancient_Egypt" "acva:Arab_Empire" "acva:Arabic_Architecture" "acva:Arabic_Art"

Update tasks_prompt_formatting.py

1cd4ddc

Add `LETTER_INDICES_AR` List and update `mmlu_harness_arabic` and `exams_harness_arabic` to match the new changes

Update tasks_table.jsonl

016e0ec

Update a typo in `arabic_exams` in line 1203 from : "hf_avail_splits":["test","dev"],"evaluation_splits":["test"],"few_shots_split":"dev" to : "hf_avail_splits":["test","validation"],"evaluation_splits":["test"],"few_shots_split":"validation"

Update OALL_tasks_test.txt

dbae73b

temporary deleting "lighteval|acva:entertainment|5|1"

Update OALL_tasks_test.txt

2234eb9

Add back "lighteval|acva:entertainment|5|1"

Update tasks_table.jsonl

2938b66

Update metric for acva benchmark From : loglikelihood_acc_single_token To : loglikelihood_acc

Create folder in the root and adding the file to load AceGPT benchmar…

4d2154e

…king suite + Apply fixes from pre-commit hooks

Update OALL_tasks.txt

6a347e4

Change `lighteval` suite to `community` for arabic benchmarks

Delete tasks_examples/OALL_tasks_test.txt

8bc7d0e

Update tasks_prompt_formatting.py

5f341bf

revert previous commits

Update tasks_table.jsonl

784cbe5

revert previous commits

Merge branch 'main' into main

fe4d499

NathanHB reviewed Feb 24, 2024

View reviewed changes

community_tasks/arabic_evals.py Outdated Show resolved Hide resolved

Update community_tasks/arabic_evals.py

2e9ede4

Fix typo `mmlu` to `arabic_mmlu` : line 55 Co-authored-by: Nathan Habib <[email protected]>

alielfilali01 requested a review from NathanHB February 24, 2024 12:04

Fix Checks

8279bef

clefourrier approved these changes Feb 26, 2024

View reviewed changes

clefourrier merged commit 090101f into huggingface:main Feb 26, 2024

hynky1999 pushed a commit that referenced this pull request May 22, 2025

Adding support for Arabic benchmarks : AceGPT benchmarking suite (#44)

461b89e

Adds - `mmlu_harness_arabic` - `exams_harness_arabic` - `acva` as custom tasks. --------- Co-authored-by: Nathan Habib <[email protected]>

NathanHB added a commit that referenced this pull request Sep 19, 2025

Adding support for Arabic benchmarks : AceGPT benchmarking suite (#44)

470ace8

Adds - `mmlu_harness_arabic` - `exams_harness_arabic` - `acva` as custom tasks. --------- Co-authored-by: Nathan Habib <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding support for Arabic benchmarks : AceGPT benchmarking suite #44

Adding support for Arabic benchmarks : AceGPT benchmarking suite #44

Uh oh!

alielfilali01 commented Feb 20, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

clefourrier commented Feb 22, 2024 •

edited

Loading

Uh oh!

alielfilali01 commented Feb 23, 2024

Uh oh!

clefourrier commented Feb 23, 2024

Uh oh!

Uh oh!

NathanHB commented Feb 24, 2024

Uh oh!

clefourrier commented Feb 26, 2024 •

edited

Loading

Uh oh!

alielfilali01 commented Feb 26, 2024

Uh oh!

clefourrier left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Adding support for Arabic benchmarks : AceGPT benchmarking suite #44

Adding support for Arabic benchmarks : AceGPT benchmarking suite #44

Uh oh!

Conversation

alielfilali01 commented Feb 20, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

clefourrier commented Feb 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alielfilali01 commented Feb 23, 2024

Uh oh!

clefourrier commented Feb 23, 2024

Uh oh!

Uh oh!

NathanHB commented Feb 24, 2024

Uh oh!

clefourrier commented Feb 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alielfilali01 commented Feb 26, 2024

Uh oh!

clefourrier left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

clefourrier commented Feb 22, 2024 •

edited

Loading

clefourrier commented Feb 26, 2024 •

edited

Loading