Adding support for Arabic benchmarks : AlGhafa benchmarking suite #95

alielfilali01 · 2024-03-05T16:39:28Z

AlGhafa benchmarking suite, consist of 11 dataset presented in this paper and hosted in this repo in the Hub

clefourrier · 2024-03-06T11:29:48Z

Do you want us to wait for Alghafa 2 to merge this?

alielfilali01 · 2024-03-06T13:40:47Z

Do you want us to wait for Alghafa 2 to merge this?

Yes please @clefourrier , i will take some time before Saturday to add the new version of the benchmark

clefourrier · 2024-03-06T13:41:30Z

No hurries, take your time!

alielfilali01 · 2024-03-11T23:48:42Z

Hello @clefourrier , I believe this PR is ready to be merged

Add Support for the AlGhafa benchmarking suite

Adding support to the AlGhafa benchmarking suite

remove translated from AlGhafa

This file now contains all the arabic tasks including tasks not present in OALL_tasks.txt

Add support for ALGHAFA TRANSLATED tasks

Add support to AlGhafa Translated benchmark suite (11 subsets)

minor fixes flagged by the pre-commit hook

no need

clefourrier

LGTM but you need to homogeneize your naming:

Prompt names such as boolq_function will be unclear long term. For such functions, you could either use boolq_prompt_arabic or just boolq_arabic. (You need to specify the language since there is already a boolq prompt function by default.)
You also need to homogeneize Alghafa, which exists with several different casings, and fit it to Python style casing. For the prompt fonction, I'd keep it as alghafa_prompt or alghafa, for the class, CustomAlGhafaTask, and here for the name I'd keep it lower case
[CustomAlGhafaTask(name=f"alghafa:{subset}", hf_subset=subset) for subset in ALGHAFA_SUBSETS]

community_tasks/arabic_evals.py

auto_commit_fixes.sh

homogeneize naming according to the following comments : #### Prompt names such as boolq_function will be unclear long term. For such functions, you could either use boolq_prompt_arabic or just boolq_arabic. (You need to specify the language since there is already a boolq prompt function by default.) You also need to homogeneize Alghafa, which exists with several different casings, and fit it to Python style casing. For the prompt fonction, I'd keep it as alghafa_prompt or alghafa, for the class, CustomAlGhafaTask, and here for the name I'd keep it lower case [CustomAlGhafaTask(name=f"alghafa:{subset}", hf_subset=subset) for subset in ALGHAFA_SUBSETS] ####

homogeneize AlGhafa naming : `Alghafa` to `alghafa`

Co-authored-by: Clémentine Fourrier <[email protected]>

community_tasks/arabic_evals.py

clefourrier

Hi. This needs a bit more changes, I tried to make what is requested clearer.
I also added comments about tasks level instructions that I had missed previously

use the standard camel casing for classes: (remove) class CustomALGHAFATask(LightevalTaskConfig): (add) class CustomAlGhafaTask(LightevalTaskConfig): Co-authored-by: Clémentine Fourrier <[email protected]>

Fixes based on Clementine's comments

alielfilali01 · 2024-03-14T15:41:45Z

@clefourrier I hope this answers to your comments, plz feel free to ping me if i missed anything (i have a tendency to forget 😅)
Again thanks a lot for the efforts 🤗

clefourrier · 2024-03-19T08:22:12Z

Looks better thank you!
Do you have some reference models and scores against which I could check the implementation?
Or did you check it, and against which models? :)

alielfilali01 · 2024-03-19T20:27:17Z

Looks better thank you! Do you have some reference models and scores against which I could check the implementation? Or did you check it, and against which models? :)

Yes @clefourrier , I tested gpt2 using --max_samples=1 and everything was fine and I believe Hamza is on it to test on bigger models and push the results to the hub for further inspection. I'll update you as soon as i hear back from Hamza

clefourrier · 2024-03-20T08:20:37Z

Sounds good, feel free to ping me whenever :)

Fix ValueError: Prompt query

clefourrier

LGTM, thanks for the edits and tests!

thevexx · 2024-04-12T19:57:58Z

AlGhafa eval dataset is no longer available on Huggingface, any alternatives ?

alielfilali01 · 2024-04-12T21:50:36Z

AlGhafa eval dataset is no longer available on Huggingface, any alternatives ?

Hi there, Can you plz provide more context ? I have checked the eval code and it seems it works fine

thevexx · 2024-04-13T16:15:56Z

Hi there, Can you plz provide more context ? I have checked the eval code and it seems it works fine

Hi, yesterday the datasets disappeared from the OALL Huggingface account, now i can see them, thanks

alielfilali01 · 2024-04-13T16:23:38Z

Hi there, Can you plz provide more context ? I have checked the eval code and it seems it works fine

Hi, yesterday the datasets disappeared from the OALL Huggingface account, now i can see them, thanks

OOH I see, i had to make the datasets private for about 20 min yesterday cuz i was testing something, what a coincidence you checked it at the same time 😅
sorry for the inconvenience 🤗

- Add Support for the AlGhafa benchmarking suite --------- Co-authored-by: Clémentine Fourrier <[email protected]>

alielfilali01 marked this pull request as draft March 6, 2024 20:54

alielfilali01 force-pushed the main branch from e43c2fa to 8d6d62a Compare March 8, 2024 18:08

alielfilali01 marked this pull request as ready for review March 8, 2024 18:35

alielfilali01 and others added 12 commits March 11, 2024 23:49

Update arabic_evals.py

859c5f0

Add Support for the AlGhafa benchmarking suite

Update OALL_tasks.txt

6249e1f

Adding support to the AlGhafa benchmarking suite

Update arabic_evals.py

d30d1ed

remove translated from AlGhafa

Create all_arabic_tasks.txt

7f1e657

This file now contains all the arabic tasks including tasks not present in OALL_tasks.txt

Update OALL_tasks.txt

129733b

Add support for ALGHAFA TRANSLATED tasks

Update arabic_evals.py

fafdc1b

Add support to AlGhafa Translated benchmark suite (11 subsets)

Update arabic_evals.py

9bb4da0

minor fixes flagged by the pre-commit hook

fix checks

3298ab5

Delete auto_commit_fixes.sh

55e27bb

no need

fix checks

fea2cec

alielfilali01 force-pushed the main branch from 8d6d62a to fea2cec Compare March 11, 2024 23:49

clefourrier reviewed Mar 12, 2024

View reviewed changes

community_tasks/arabic_evals.py Outdated Show resolved Hide resolved

community_tasks/arabic_evals.py Outdated Show resolved Hide resolved

community_tasks/arabic_evals.py Outdated Show resolved Hide resolved

community_tasks/arabic_evals.py Outdated Show resolved Hide resolved

clefourrier reviewed Mar 12, 2024

View reviewed changes

auto_commit_fixes.sh Outdated Show resolved Hide resolved

alielfilali01 and others added 8 commits March 12, 2024 10:00

Delete auto_commit_fixes.sh

d6646f9

Update OALL_tasks.txt

2fbad52

homogeneize AlGhafa naming : `Alghafa` to `alghafa`

Update all_arabic_tasks.txt

f372403

homogeneize AlGhafa naming : `Alghafa` to `alghafa`

Update community_tasks/arabic_evals.py

2c492f6

Co-authored-by: Clémentine Fourrier <[email protected]>

Update community_tasks/arabic_evals.py

e7ddfb5

Co-authored-by: Clémentine Fourrier <[email protected]>

Update community_tasks/arabic_evals.py

f7278c1

Co-authored-by: Clémentine Fourrier <[email protected]>

Update community_tasks/arabic_evals.py

f186ded

Co-authored-by: Clémentine Fourrier <[email protected]>