Added 5 datasets for audio pair classification by kkaitlyn111 · Pull Request #2463 · embeddings-benchmark/mteb

kkaitlyn111 · 2025-03-30T07:03:38Z

Fixes #2459

Code Quality

Code Formatted: Format the code using make lint to maintain consistent style.

Documentation

Updated Documentation: Add or update documentation to reflect the changes introduced in this PR.

Testing

New Tests Added: Write tests to cover new functionality. Validate with make test-with-coverage.
Tests Passed: Run tests locally using make test or make test-with-coverage to ensure no existing functionality is broken.

Adding datasets checklist

Reason for dataset addition: VocalSound: label indicates if 2 audio clips are the same human sound. VoxPopuliAccent (eng): label indicates if 2 audio clips are spoken with the same variant of English

I have run the following models on the task (adding the results to the pr). These can be run using the mteb -m {model_name} -t {task_name} command.
- sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
- intfloat/multilingual-e5-small
I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
If the dataset is too big (e.g. >2048 examples), considering using self.stratified_subsampling() under dataset_transform()
I have filled out the metadata object in the dataset file (find documentation on it here).
Run tests locally to make sure nothing is broken using make test.
Run the formatter to format the code using make lint.

kkaitlyn111 · 2025-03-30T07:06:33Z

Addresses #2461

Samoed · 2025-03-30T12:22:09Z

Can you use ruff from mteb config?

This reverts commit fc03884.

This reverts commit b5d2046. revert formatting changes

kkaitlyn111 · 2025-04-01T07:09:14Z

It got a little messy but I've cleaned it up and the last commit contains all of the relevant changes.

kkaitlyn111 · 2025-04-01T07:12:24Z

For some reason the pre-commit hook edited a bunch of other files so I disabled that for now. Could you please check if this is okay? Thanks for your review! @Samoed

Samoed · 2025-04-01T07:57:32Z

Yes, that this is normal. Maybe there someone used with different ruff version. I will review after merge of #2457

…tion_speech

isaac-chung · 2025-04-04T20:09:02Z

@kkaitlyn111 I resolved the merge conflicts and linted this branch. The only remaining thing should be the task metadata - right now the following test fails:

pytest tests/test_TaskMetadata.py::test_all_metadata_is_filled_and_valid

mteb/tasks/Audio/AudioPairClassification/eng/VoxPopuliAccent.py

mteb/tasks/Audio/AudioPairClassification/eng/VocalSound.py

mteb/tasks/Audio/AudioPairClassification/eng/NMSQA.py

isaac-chung · 2025-06-08T15:29:40Z

Results on facebook/wav2vec2-base

mteb run -m facebook/wav2vec2-base -t CREMADPairClassification ESC50PairClassification NMSQAPairClassification VocalSoundPairClassification VoxPopuliAccentPairClassification

Model results

INFO:mteb.evaluation.MTEB:Evaluation for CREMADPairClassification on test took 1035.70 seconds
INFO:mteb.evaluation.MTEB:Scores: {'default': {'similarity_accuracy': 0.538099084544965, 'similarity_accuracy_threshold': 0.7791836857795715, 'similarity_f1': 0.6666068061416899, 'similarity_f1_threshold': 0.1679917573928833, 'similarity_precision': 0.5000673582109659, 'similarity_recall': 0.9994614970382337, 'similarity_ap': 0.5462917161631282, 'cosine_accuracy': 0.538099084544965, 'cosine_accuracy_threshold': 0.7791836857795715, 'cosine_f1': 0.6666068061416899, 'cosine_f1_threshold': 0.1679917573928833, 'cosine_precision': 0.5000673582109659, 'cosine_recall': 0.9994614970382337, 'cosine_ap': 0.5462917161631282, 'manhattan_accuracy': 0.5363489499192245, 'manhattan_accuracy_threshold': 87.07490539550781, 'manhattan_f1': 0.6667265056996678, 'manhattan_f1_threshold': 189.21218872070312, 'manhattan_precision': 0.5000673219334859, 'manhattan_recall': 1.0, 'manhattan_ap': 0.5479836251517015, 'euclidean_accuracy': 0.5350026925148088, 'euclidean_accuracy_threshold': 4.195163726806641, 'euclidean_f1': 0.6667864522504717, 'euclidean_f1_threshold': 8.884632110595703, 'euclidean_precision': 0.500337063502764, 'euclidean_recall': 0.9991922455573505, 'euclidean_ap': 0.5464101428588402, 'dot_accuracy': 0.5417339795368874, 'dot_accuracy_threshold': 31.191160202026367, 'dot_f1': 0.6665469886006643, 'dot_f1_threshold': 5.114724159240723, 'dot_precision': 0.4999326780665141, 'dot_recall': 0.9997307485191168, 'dot_ap': 0.5348857917075092, 'max_accuracy': 0.5417339795368874, 'max_f1': 0.6667864522504717, 'max_precision': 0.500337063502764, 'max_recall': 1.0, 'max_ap': 0.5479836251517015, 'main_score': 0.5479836251517015}}

INFO:mteb.evaluation.MTEB:Evaluation for ESC50PairClassification on test took 1953.64 seconds
INFO:mteb.evaluation.MTEB:Scores: {'default': {'similarity_accuracy': 0.6535, 'similarity_accuracy_threshold': 0.5829533338546753, 'similarity_f1': 0.6815444562522139, 'similarity_f1_threshold': 0.20864984393119812, 'similarity_precision': 0.5277015907844212, 'similarity_recall': 0.962, 'similarity_ap': 0.7244448095650275, 'cosine_accuracy': 0.6535, 'cosine_accuracy_threshold': 0.5829533338546753, 'cosine_f1': 0.6815444562522139, 'cosine_f1_threshold': 0.20864984393119812, 'cosine_precision': 0.5277015907844212, 'cosine_recall': 0.962, 'cosine_ap': 0.7244448095650275, 'manhattan_accuracy': 0.6425, 'manhattan_accuracy_threshold': 117.25523376464844, 'manhattan_f1': 0.6698741672834937, 'manhattan_f1_threshold': 186.19790649414062, 'manhattan_precision': 0.5317273795534665, 'manhattan_recall': 0.905, 'manhattan_ap': 0.7144574147924443, 'euclidean_accuracy': 0.6395, 'euclidean_accuracy_threshold': 6.804784774780273, 'euclidean_f1': 0.6753812636165577, 'euclidean_f1_threshold': 9.670499801635742, 'euclidean_precision': 0.5302166476624858, 'euclidean_recall': 0.93, 'euclidean_ap': 0.7165764635213258, 'dot_accuracy': 0.6515, 'dot_accuracy_threshold': 31.01972198486328, 'dot_f1': 0.6834008097165992, 'dot_f1_threshold': 22.88848114013672, 'dot_precision': 0.5741496598639456, 'dot_recall': 0.844, 'dot_ap': 0.6955035221513366, 'max_accuracy': 0.6535, 'max_f1': 0.6834008097165992, 'max_precision': 0.5741496598639456, 'max_recall': 0.962, 'max_ap': 0.7244448095650275, 'main_score': 0.7244448095650275}}

switchpiggy and others added 4 commits March 29, 2025 21:08

added audio pair classification abstask and evaluator

dcd141b

added CREMAD dataset

7c5266f

added NMSQA dataset

fc03884

added VocalSound and VoxPopuliAccent (english only) datasets

b5d2046

kkaitlyn111 changed the title ~~Add VocalSound and VoxPopuliAccent datasets for pair classification~~ Add VocalSound and VoxPopuli datasets for pair classification Mar 30, 2025

kkaitlyn111 changed the title ~~Add VocalSound and VoxPopuli datasets for pair classification~~ Add VocalSound and VoxPopuli (accent) datasets for pair classification Mar 30, 2025

Kaitlyn Wang added 4 commits March 31, 2025 23:10

remove formatting changes

ee59b51

Revert "added NMSQA dataset"

31a4d0f

This reverts commit fc03884.

Revert "added VocalSound and VoxPopuliAccent (english only) datasets"

fd61f7d

This reverts commit b5d2046. revert formatting changes

added CREMAD, ESC50, NMSQA, VocalSound, VoxPopuliAccent datasets

2ce10b3

kkaitlyn111 changed the title ~~Add VocalSound and VoxPopuli (accent) datasets for pair classification~~ Added 5 datasets for audio pair classification Apr 1, 2025

isaac-chung added 2 commits April 4, 2025 22:50

Merge remote-tracking branch 'origin/maeb' into audio_pair_classifica…

655b23f

…tion_speech

fix script

81c3501

isaac-chung linked an issue Apr 4, 2025 that may be closed by this pull request

Add ESC50 (pair classification task) #2459

Closed

Samoed reviewed Apr 4, 2025

View reviewed changes

removed pd dataframe, implemented reviewer feedback

e1875bb

kkaitlyn111 mentioned this pull request May 9, 2025

MAEB Overview Issue #2072

Closed

84 tasks

isaac-chung added the audio Audio extension label Jun 1, 2025

Merge branch 'maeb' into audio_pair_classification_speech

a7dbb67

isaac-chung self-assigned this Jun 3, 2025

isaac-chung added 4 commits June 8, 2025 15:29

Merge branch 'maeb' into audio_pair_classification_speech

c1a3ce1

fix tests and lint

b0d9e88

remove excess logging

2911ab8

clean up import

011d373

isaac-chung added 2 commits June 8, 2025 16:01

fix category

4534ce2

clean up

ccec64a

isaac-chung added 2 commits June 8, 2025 17:10

descriptive_stats

af1d5ee

fix metadata

91371ae

isaac-chung merged commit 31f38f2 into embeddings-benchmark:maeb Jun 8, 2025
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added 5 datasets for audio pair classification#2463

Added 5 datasets for audio pair classification#2463
isaac-chung merged 20 commits intoembeddings-benchmark:maebfrom
switchpiggy:audio_pair_classification_speech

kkaitlyn111 commented Mar 30, 2025 •

edited by isaac-chung

Loading

Uh oh!

kkaitlyn111 commented Mar 30, 2025

Uh oh!

Samoed commented Mar 30, 2025

Uh oh!

kkaitlyn111 commented Apr 1, 2025

Uh oh!

kkaitlyn111 commented Apr 1, 2025

Uh oh!

Samoed commented Apr 1, 2025

Uh oh!

isaac-chung commented Apr 4, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

isaac-chung commented Jun 8, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

kkaitlyn111 commented Mar 30, 2025 • edited by isaac-chung Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Quality

Documentation

Testing

Adding datasets checklist

Uh oh!

kkaitlyn111 commented Mar 30, 2025

Uh oh!

Samoed commented Mar 30, 2025

Uh oh!

kkaitlyn111 commented Apr 1, 2025

Uh oh!

kkaitlyn111 commented Apr 1, 2025

Uh oh!

Samoed commented Apr 1, 2025

Uh oh!

isaac-chung commented Apr 4, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

isaac-chung commented Jun 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kkaitlyn111 commented Mar 30, 2025 •

edited by isaac-chung

Loading

isaac-chung commented Jun 8, 2025 •

edited

Loading