Skip to content

Added 5 datasets for audio pair classification#2463

Merged
isaac-chung merged 20 commits intoembeddings-benchmark:maebfrom
switchpiggy:audio_pair_classification_speech
Jun 8, 2025
Merged

Added 5 datasets for audio pair classification#2463
isaac-chung merged 20 commits intoembeddings-benchmark:maebfrom
switchpiggy:audio_pair_classification_speech

Conversation

@kkaitlyn111
Copy link
Contributor

@kkaitlyn111 kkaitlyn111 commented Mar 30, 2025

Fixes #2459

Code Quality

  • Code Formatted: Format the code using make lint to maintain consistent style.

Documentation

  • Updated Documentation: Add or update documentation to reflect the changes introduced in this PR.

Testing

  • New Tests Added: Write tests to cover new functionality. Validate with make test-with-coverage.
  • Tests Passed: Run tests locally using make test or make test-with-coverage to ensure no existing functionality is broken.

Adding datasets checklist

Reason for dataset addition: VocalSound: label indicates if 2 audio clips are the same human sound. VoxPopuliAccent (eng): label indicates if 2 audio clips are spoken with the same variant of English

  • I have run the following models on the task (adding the results to the pr). These can be run using the mteb -m {model_name} -t {task_name} command.
    • sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
    • intfloat/multilingual-e5-small
  • I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
  • If the dataset is too big (e.g. >2048 examples), considering using self.stratified_subsampling() under dataset_transform()
  • I have filled out the metadata object in the dataset file (find documentation on it here).
  • Run tests locally to make sure nothing is broken using make test.
  • Run the formatter to format the code using make lint.

@kkaitlyn111
Copy link
Contributor Author

Addresses #2461

@kkaitlyn111 kkaitlyn111 changed the title Add VocalSound and VoxPopuliAccent datasets for pair classification Add VocalSound and VoxPopuli datasets for pair classification Mar 30, 2025
@kkaitlyn111 kkaitlyn111 changed the title Add VocalSound and VoxPopuli datasets for pair classification Add VocalSound and VoxPopuli (accent) datasets for pair classification Mar 30, 2025
@Samoed
Copy link
Member

Samoed commented Mar 30, 2025

Can you use ruff from mteb config?

@kkaitlyn111 kkaitlyn111 changed the title Add VocalSound and VoxPopuli (accent) datasets for pair classification Added 5 datasets for audio pair classification Apr 1, 2025
@kkaitlyn111
Copy link
Contributor Author

It got a little messy but I've cleaned it up and the last commit contains all of the relevant changes.

@kkaitlyn111
Copy link
Contributor Author

For some reason the pre-commit hook edited a bunch of other files so I disabled that for now. Could you please check if this is okay? Thanks for your review! @Samoed

@Samoed
Copy link
Member

Samoed commented Apr 1, 2025

Yes, that this is normal. Maybe there someone used with different ruff version. I will review after merge of #2457

@isaac-chung
Copy link
Collaborator

@kkaitlyn111 I resolved the merge conflicts and linted this branch. The only remaining thing should be the task metadata - right now the following test fails:

pytest tests/test_TaskMetadata.py::test_all_metadata_is_filled_and_valid

@isaac-chung isaac-chung linked an issue Apr 4, 2025 that may be closed by this pull request
@kkaitlyn111 kkaitlyn111 mentioned this pull request May 9, 2025
84 tasks
@isaac-chung isaac-chung added the audio Audio extension label Jun 1, 2025
@isaac-chung isaac-chung self-assigned this Jun 3, 2025
@isaac-chung
Copy link
Collaborator

isaac-chung commented Jun 8, 2025

Results on facebook/wav2vec2-base

mteb run -m facebook/wav2vec2-base -t CREMADPairClassification ESC50PairClassification NMSQAPairClassification VocalSoundPairClassification VoxPopuliAccentPairClassification
Model results
INFO:mteb.evaluation.MTEB:Evaluation for CREMADPairClassification on test took 1035.70 seconds
INFO:mteb.evaluation.MTEB:Scores: {'default': {'similarity_accuracy': 0.538099084544965, 'similarity_accuracy_threshold': 0.7791836857795715, 'similarity_f1': 0.6666068061416899, 'similarity_f1_threshold': 0.1679917573928833, 'similarity_precision': 0.5000673582109659, 'similarity_recall': 0.9994614970382337, 'similarity_ap': 0.5462917161631282, 'cosine_accuracy': 0.538099084544965, 'cosine_accuracy_threshold': 0.7791836857795715, 'cosine_f1': 0.6666068061416899, 'cosine_f1_threshold': 0.1679917573928833, 'cosine_precision': 0.5000673582109659, 'cosine_recall': 0.9994614970382337, 'cosine_ap': 0.5462917161631282, 'manhattan_accuracy': 0.5363489499192245, 'manhattan_accuracy_threshold': 87.07490539550781, 'manhattan_f1': 0.6667265056996678, 'manhattan_f1_threshold': 189.21218872070312, 'manhattan_precision': 0.5000673219334859, 'manhattan_recall': 1.0, 'manhattan_ap': 0.5479836251517015, 'euclidean_accuracy': 0.5350026925148088, 'euclidean_accuracy_threshold': 4.195163726806641, 'euclidean_f1': 0.6667864522504717, 'euclidean_f1_threshold': 8.884632110595703, 'euclidean_precision': 0.500337063502764, 'euclidean_recall': 0.9991922455573505, 'euclidean_ap': 0.5464101428588402, 'dot_accuracy': 0.5417339795368874, 'dot_accuracy_threshold': 31.191160202026367, 'dot_f1': 0.6665469886006643, 'dot_f1_threshold': 5.114724159240723, 'dot_precision': 0.4999326780665141, 'dot_recall': 0.9997307485191168, 'dot_ap': 0.5348857917075092, 'max_accuracy': 0.5417339795368874, 'max_f1': 0.6667864522504717, 'max_precision': 0.500337063502764, 'max_recall': 1.0, 'max_ap': 0.5479836251517015, 'main_score': 0.5479836251517015}}

INFO:mteb.evaluation.MTEB:Evaluation for ESC50PairClassification on test took 1953.64 seconds
INFO:mteb.evaluation.MTEB:Scores: {'default': {'similarity_accuracy': 0.6535, 'similarity_accuracy_threshold': 0.5829533338546753, 'similarity_f1': 0.6815444562522139, 'similarity_f1_threshold': 0.20864984393119812, 'similarity_precision': 0.5277015907844212, 'similarity_recall': 0.962, 'similarity_ap': 0.7244448095650275, 'cosine_accuracy': 0.6535, 'cosine_accuracy_threshold': 0.5829533338546753, 'cosine_f1': 0.6815444562522139, 'cosine_f1_threshold': 0.20864984393119812, 'cosine_precision': 0.5277015907844212, 'cosine_recall': 0.962, 'cosine_ap': 0.7244448095650275, 'manhattan_accuracy': 0.6425, 'manhattan_accuracy_threshold': 117.25523376464844, 'manhattan_f1': 0.6698741672834937, 'manhattan_f1_threshold': 186.19790649414062, 'manhattan_precision': 0.5317273795534665, 'manhattan_recall': 0.905, 'manhattan_ap': 0.7144574147924443, 'euclidean_accuracy': 0.6395, 'euclidean_accuracy_threshold': 6.804784774780273, 'euclidean_f1': 0.6753812636165577, 'euclidean_f1_threshold': 9.670499801635742, 'euclidean_precision': 0.5302166476624858, 'euclidean_recall': 0.93, 'euclidean_ap': 0.7165764635213258, 'dot_accuracy': 0.6515, 'dot_accuracy_threshold': 31.01972198486328, 'dot_f1': 0.6834008097165992, 'dot_f1_threshold': 22.88848114013672, 'dot_precision': 0.5741496598639456, 'dot_recall': 0.844, 'dot_ap': 0.6955035221513366, 'max_accuracy': 0.6535, 'max_f1': 0.6834008097165992, 'max_precision': 0.5741496598639456, 'max_recall': 0.962, 'max_ap': 0.7244448095650275, 'main_score': 0.7244448095650275}}

@isaac-chung isaac-chung merged commit 31f38f2 into embeddings-benchmark:maeb Jun 8, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

audio Audio extension

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add ESC50 (pair classification task)

4 participants