Common voice by hepengfe · Pull Request #2951 · embeddings-benchmark/mteb

hepengfe · 2025-07-26T18:36:03Z

This PR fixes #2050

I have outlined why this dataset is filling an existing gap in mteb
I have tested that the dataset runs with the mteb package.

An easy way to test it is using:

import mteb
# sample model:
model = mteb.get_model("laion/clap-htsat-unfused")

task = mteb.get_task("CommonVoiceT2ARetrieval")
evaluation = mteb.MTEB(tasks=[task])
evaluation.run(model)

I have run the following models on the task (adding the results to the pr). These can be run using the mteb run -m {model_name} -t {task_name} command.
- laion/clap-htsat-unfused
I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
I have considered the size of the dataset and reduced it if it is too big (2048 examples is typically large enough for most tasks) - I set the evaluation language to be the one with very small dataset.

isaac-chung

Great start! Looks like you still need to run linting and get tests passing. If you're able to run the task, please attach the results in a comment.

isaac-chung · 2025-07-26T19:35:16Z

mteb/tasks/Audio/Any2AnyRetrieval/CommonVoice.py

Let's specify version to be clear. Also, this should inherit from MultilingualTask.

Suggested change

class CommonVoiceA2TRetrieval(AbsTaskAny2AnyRetrieval):

from mteb.abstasks.MultilingualTask import MultilingualTask

class CommonVoice17A2TRetrieval(AbsTaskAny2AnyRetrieval, MultilingualTask):

isaac-chung · 2025-07-26T19:35:25Z

mteb/tasks/Audio/Any2AnyRetrieval/CommonVoice.py

Suggested change

name="CommonVoiceA2TRetrieval",

name="CommonVoice17A2TRetrieval",

isaac-chung · 2025-07-26T19:37:13Z

mteb/tasks/Audio/Any2AnyRetrieval/CommonVoice.py

Please include all languages.

hepengfe · 2025-07-29T04:31:35Z

@isaac-chung Here is the evaluation results for "afrikaans" A2T retrieval.

{
  "dataset_revision": "b10d53980ef166bc24ce3358471c1970d7e6b5ec",
  "task_name": "CommonVoiceA2TRetrieval",
  "mteb_version": "1.21.3",
  "scores": {
    "test": [
      {
        "ndcg_at_1": 0.01613,
        "ndcg_at_3": 0.02631,
        "ndcg_at_5": 0.04573,
        "ndcg_at_10": 0.07229,
        "ndcg_at_20": 0.12053,
        "ndcg_at_100": 0.24477,
        "ndcg_at_1000": 0.24477,
        "map_at_1": 0.01613,
        "map_at_3": 0.02419,
        "map_at_5": 0.03468,
        "map_at_10": 0.04597,
        "map_at_20": 0.05881,
        "map_at_100": 0.07739,
        "map_at_1000": 0.07739,
        "recall_at_1": 0.01613,
        "recall_at_3": 0.03226,
        "recall_at_5": 0.08065,
        "recall_at_10": 0.16129,
        "recall_at_20": 0.35484,
        "recall_at_100": 1.0,
        "recall_at_1000": 1.0,
        "cv_recall_at_1": 0.01613,
        "cv_recall_at_3": 0.03226,
        "cv_recall_at_5": 0.08065,
        "cv_recall_at_10": 0.16129,
        "cv_recall_at_20": 0.35484,
        "cv_recall_at_100": 1.0,
        "cv_recall_at_1000": 1.0,
        "precision_at_1": 0.01613,
        "precision_at_3": 0.01075,
        "precision_at_5": 0.01613,
        "precision_at_10": 0.01613,
        "precision_at_20": 0.01774,
        "precision_at_100": 0.01,
        "precision_at_1000": 0.001,
        "mrr_at_1": 0.016129,
        "mrr_at_3": 0.024194,
        "mrr_at_5": 0.034677,
        "mrr_at_10": 0.045968,
        "mrr_at_20": 0.058809,
        "mrr_at_100": 0.077391,
        "mrr_at_1000": 0.077391,
        "nauc_ndcg_at_1_max": 0.312997,
        "nauc_ndcg_at_1_std": 0.312997,
        "nauc_ndcg_at_1_diff1": 0.096048,
        "nauc_ndcg_at_3_max": 0.001772,
        "nauc_ndcg_at_3_std": 0.001772,
        "nauc_ndcg_at_3_diff1": 0.096048,
        "nauc_ndcg_at_5_max": 0.14011,
        "nauc_ndcg_at_5_std": 0.056122,
        "nauc_ndcg_at_5_diff1": 0.133503,
        "nauc_ndcg_at_10_max": 0.14321,
        "nauc_ndcg_at_10_std": 0.106188,
        "nauc_ndcg_at_10_diff1": 0.13122,
        "nauc_ndcg_at_20_max": 0.106183,
        "nauc_ndcg_at_20_std": 0.026165,
        "nauc_ndcg_at_20_diff1": 0.084842,
        "nauc_ndcg_at_100_max": 0.088356,
        "nauc_ndcg_at_100_std": 0.054598,
        "nauc_ndcg_at_100_diff1": 0.066126,
        "nauc_ndcg_at_1000_max": 0.088356,
        "nauc_ndcg_at_1000_std": 0.054598,
        "nauc_ndcg_at_1000_diff1": 0.066126,
        "nauc_map_at_1_max": 0.312997,
        "nauc_map_at_1_std": 0.312997,
        "nauc_map_at_1_diff1": 0.096048,
        "nauc_map_at_3_max": 0.044828,
        "nauc_map_at_3_std": 0.044828,
        "nauc_map_at_3_diff1": 0.096048,
        "nauc_map_at_5_max": 0.12523,
        "nauc_map_at_5_std": 0.064861,
        "nauc_map_at_5_diff1": 0.117796,
        "nauc_map_at_10_max": 0.110845,
        "nauc_map_at_10_std": 0.091807,
        "nauc_map_at_10_diff1": 0.107052,
        "nauc_map_at_20_max": 0.09885,
        "nauc_map_at_20_std": 0.056171,
        "nauc_map_at_20_diff1": 0.089019,
        "nauc_map_at_100_max": 0.091044,
        "nauc_map_at_100_std": 0.064294,
        "nauc_map_at_100_diff1": 0.075931,
        "nauc_map_at_1000_max": 0.091044,
        "nauc_map_at_1000_std": 0.064294,
        "nauc_map_at_1000_diff1": 0.075931,
        "nauc_recall_at_1_max": 0.312997,
        "nauc_recall_at_1_std": 0.312997,
        "nauc_recall_at_1_diff1": 0.096048,
        "nauc_recall_at_3_max": -0.089256,
        "nauc_recall_at_3_std": -0.089256,
        "nauc_recall_at_3_diff1": 0.096048,
        "nauc_recall_at_5_max": 0.169741,
        "nauc_recall_at_5_std": 0.052166,
        "nauc_recall_at_5_diff1": 0.157699,
        "nauc_recall_at_10_max": 0.193538,
        "nauc_recall_at_10_std": 0.13094,
        "nauc_recall_at_10_diff1": 0.164648,
        "nauc_recall_at_20_max": 0.114459,
        "nauc_recall_at_20_std": -0.005367,
        "nauc_recall_at_20_diff1": 0.077691,
        "nauc_recall_at_100_max": NaN,
        "nauc_recall_at_100_std": NaN,
        "nauc_recall_at_100_diff1": NaN,
        "nauc_recall_at_1000_max": NaN,
        "nauc_recall_at_1000_std": NaN,
        "nauc_recall_at_1000_diff1": NaN,
        "nauc_precision_at_1_max": 0.312997,
        "nauc_precision_at_1_std": 0.312997,
        "nauc_precision_at_1_diff1": 0.096048,
        "nauc_precision_at_3_max": -0.089256,
        "nauc_precision_at_3_std": -0.089256,
        "nauc_precision_at_3_diff1": 0.096048,
        "nauc_precision_at_5_max": 0.169741,
        "nauc_precision_at_5_std": 0.052166,
        "nauc_precision_at_5_diff1": 0.157699,
        "nauc_precision_at_10_max": 0.193538,
        "nauc_precision_at_10_std": 0.13094,
        "nauc_precision_at_10_diff1": 0.164648,
        "nauc_precision_at_20_max": 0.114459,
        "nauc_precision_at_20_std": -0.005367,
        "nauc_precision_at_20_diff1": 0.077691,
        "nauc_precision_at_100_max": 1.0,
        "nauc_precision_at_100_std": 1.0,
        "nauc_precision_at_100_diff1": 1.0,
        "nauc_precision_at_1000_max": NaN,
        "nauc_precision_at_1000_std": NaN,
        "nauc_precision_at_1000_diff1": NaN,
        "nauc_cv_recall_at_1_max": 0.312997,
        "nauc_cv_recall_at_1_std": 0.312997,
        "nauc_cv_recall_at_1_diff1": 0.096048,
        "nauc_cv_recall_at_3_max": -0.089256,
        "nauc_cv_recall_at_3_std": -0.089256,
        "nauc_cv_recall_at_3_diff1": 0.096048,
        "nauc_cv_recall_at_5_max": 0.169741,
        "nauc_cv_recall_at_5_std": 0.052166,
        "nauc_cv_recall_at_5_diff1": 0.157699,
        "nauc_cv_recall_at_10_max": 0.193538,
        "nauc_cv_recall_at_10_std": 0.13094,
        "nauc_cv_recall_at_10_diff1": 0.164648,
        "nauc_cv_recall_at_20_max": 0.114459,
        "nauc_cv_recall_at_20_std": -0.005367,
        "nauc_cv_recall_at_20_diff1": 0.077691,
        "nauc_cv_recall_at_100_max": NaN,
        "nauc_cv_recall_at_100_std": NaN,
        "nauc_cv_recall_at_100_diff1": NaN,
        "nauc_cv_recall_at_1000_max": NaN,
        "nauc_cv_recall_at_1000_std": NaN,
        "nauc_cv_recall_at_1000_diff1": NaN,
        "nauc_mrr_at_1_max": 0.312997,
        "nauc_mrr_at_1_std": 0.312997,
        "nauc_mrr_at_1_diff1": 0.096048,
        "nauc_mrr_at_3_max": 0.044828,
        "nauc_mrr_at_3_std": 0.044828,
        "nauc_mrr_at_3_diff1": 0.096048,
        "nauc_mrr_at_5_max": 0.12523,
        "nauc_mrr_at_5_std": 0.064861,
        "nauc_mrr_at_5_diff1": 0.117796,
        "nauc_mrr_at_10_max": 0.110845,
        "nauc_mrr_at_10_std": 0.091807,
        "nauc_mrr_at_10_diff1": 0.107052,
        "nauc_mrr_at_20_max": 0.09885,
        "nauc_mrr_at_20_std": 0.056171,
        "nauc_mrr_at_20_diff1": 0.089019,
        "nauc_mrr_at_100_max": 0.091044,
        "nauc_mrr_at_100_std": 0.064294,
        "nauc_mrr_at_100_diff1": 0.075931,
        "nauc_mrr_at_1000_max": 0.091044,
        "nauc_mrr_at_1000_std": 0.064294,
        "nauc_mrr_at_1000_diff1": 0.075931,
        "main_score": 0.08065,
        "hf_subset": "default",
        "languages": [
          "af"
        ]
      }
    ]
  },
  "evaluation_time": 25.977766513824463,
  "kg_co2_emissions": null
}

hepengfe · 2025-07-29T04:31:42Z

Here is the evaluation results for "afrikaans" T2A retrieval.

{
  "dataset_revision": "b10d53980ef166bc24ce3358471c1970d7e6b5ec",
  "task_name": "CommonVoiceT2ARetrieval",
  "mteb_version": "1.21.3",
  "scores": {
    "test": [
      {
        "ndcg_at_1": 0.06452,
        "ndcg_at_3": 0.07258,
        "ndcg_at_5": 0.09201,
        "ndcg_at_10": 0.11711,
        "ndcg_at_20": 0.14548,
        "ndcg_at_100": 0.278,
        "ndcg_at_1000": 0.278,
        "map_at_1": 0.06452,
        "map_at_3": 0.06989,
        "map_at_5": 0.08038,
        "map_at_10": 0.09021,
        "map_at_20": 0.09787,
        "map_at_100": 0.11847,
        "map_at_1000": 0.11847,
        "recall_at_1": 0.06452,
        "recall_at_3": 0.08065,
        "recall_at_5": 0.12903,
        "recall_at_10": 0.20968,
        "recall_at_20": 0.32258,
        "recall_at_100": 1.0,
        "recall_at_1000": 1.0,
        "cv_recall_at_1": 0.06452,
        "cv_recall_at_3": 0.08065,
        "cv_recall_at_5": 0.12903,
        "cv_recall_at_10": 0.20968,
        "cv_recall_at_20": 0.32258,
        "cv_recall_at_100": 1.0,
        "cv_recall_at_1000": 1.0,
        "precision_at_1": 0.06452,
        "precision_at_3": 0.02688,
        "precision_at_5": 0.02581,
        "precision_at_10": 0.02097,
        "precision_at_20": 0.01613,
        "precision_at_100": 0.01,
        "precision_at_1000": 0.001,
        "mrr_at_1": 0.064516,
        "mrr_at_3": 0.069892,
        "mrr_at_5": 0.080376,
        "mrr_at_10": 0.090207,
        "mrr_at_20": 0.097866,
        "mrr_at_100": 0.118466,
        "mrr_at_1000": 0.118466,
        "nauc_ndcg_at_1_max": -0.056226,
        "nauc_ndcg_at_1_std": -0.150482,
        "nauc_ndcg_at_1_diff1": -0.089256,
        "nauc_ndcg_at_3_max": -0.09543,
        "nauc_ndcg_at_3_std": -0.179213,
        "nauc_ndcg_at_3_diff1": -0.101761,
        "nauc_ndcg_at_5_max": 0.023788,
        "nauc_ndcg_at_5_std": -0.03952,
        "nauc_ndcg_at_5_diff1": 0.008827,
        "nauc_ndcg_at_10_max": 0.065878,
        "nauc_ndcg_at_10_std": 0.063517,
        "nauc_ndcg_at_10_diff1": 0.003433,
        "nauc_ndcg_at_20_max": -0.023487,
        "nauc_ndcg_at_20_std": 0.082424,
        "nauc_ndcg_at_20_diff1": -0.009566,
        "nauc_ndcg_at_100_max": -0.022806,
        "nauc_ndcg_at_100_std": -0.009379,
        "nauc_ndcg_at_100_diff1": -0.024265,
        "nauc_ndcg_at_1000_max": -0.022806,
        "nauc_ndcg_at_1000_std": -0.009379,
        "nauc_ndcg_at_1000_diff1": -0.024265,
        "nauc_map_at_1_max": -0.056226,
        "nauc_map_at_1_std": -0.150482,
        "nauc_map_at_1_diff1": -0.089256,
        "nauc_map_at_3_max": -0.083367,
        "nauc_map_at_3_std": -0.170373,
        "nauc_map_at_3_diff1": -0.097913,
        "nauc_map_at_5_max": -0.013166,
        "nauc_map_at_5_std": -0.085539,
        "nauc_map_at_5_diff1": -0.035348,
        "nauc_map_at_10_max": 0.006031,
        "nauc_map_at_10_std": -0.024829,
        "nauc_map_at_10_diff1": -0.038193,
        "nauc_map_at_20_max": -0.024821,
        "nauc_map_at_20_std": -0.016865,
        "nauc_map_at_20_diff1": -0.04111,
        "nauc_map_at_100_max": -0.026922,
        "nauc_map_at_100_std": -0.040566,
        "nauc_map_at_100_diff1": -0.041791,
        "nauc_map_at_1000_max": -0.026922,
        "nauc_map_at_1000_std": -0.040566,
        "nauc_map_at_1000_diff1": -0.041791,
        "nauc_recall_at_1_max": -0.056226,
        "nauc_recall_at_1_std": -0.150482,
        "nauc_recall_at_1_diff1": -0.089256,
        "nauc_recall_at_3_max": -0.126794,
        "nauc_recall_at_3_std": -0.202199,
        "nauc_recall_at_3_diff1": -0.111765,
        "nauc_recall_at_5_max": 0.105986,
        "nauc_recall_at_5_std": 0.060673,
        "nauc_recall_at_5_diff1": 0.10727,
        "nauc_recall_at_10_max": 0.175779,
        "nauc_recall_at_10_std": 0.214556,
        "nauc_recall_at_10_diff1": 0.077389,
        "nauc_recall_at_20_max": -0.036852,
        "nauc_recall_at_20_std": 0.235453,
        "nauc_recall_at_20_diff1": 0.034642,
        "nauc_recall_at_100_max": NaN,
        "nauc_recall_at_100_std": NaN,
        "nauc_recall_at_100_diff1": NaN,
        "nauc_recall_at_1000_max": NaN,
        "nauc_recall_at_1000_std": NaN,
        "nauc_recall_at_1000_diff1": NaN,
        "nauc_precision_at_1_max": -0.056226,
        "nauc_precision_at_1_std": -0.150482,
        "nauc_precision_at_1_diff1": -0.089256,
        "nauc_precision_at_3_max": -0.126794,
        "nauc_precision_at_3_std": -0.202199,
        "nauc_precision_at_3_diff1": -0.111765,
        "nauc_precision_at_5_max": 0.105986,
        "nauc_precision_at_5_std": 0.060673,
        "nauc_precision_at_5_diff1": 0.10727,
        "nauc_precision_at_10_max": 0.175779,
        "nauc_precision_at_10_std": 0.214556,
        "nauc_precision_at_10_diff1": 0.077389,
        "nauc_precision_at_20_max": -0.036852,
        "nauc_precision_at_20_std": 0.235453,
        "nauc_precision_at_20_diff1": 0.034642,
        "nauc_precision_at_100_max": 1.0,
        "nauc_precision_at_100_std": 1.0,
        "nauc_precision_at_100_diff1": 1.0,
        "nauc_precision_at_1000_max": NaN,
        "nauc_precision_at_1000_std": NaN,
        "nauc_precision_at_1000_diff1": NaN,
        "nauc_cv_recall_at_1_max": -0.056226,
        "nauc_cv_recall_at_1_std": -0.150482,
        "nauc_cv_recall_at_1_diff1": -0.089256,
        "nauc_cv_recall_at_3_max": -0.126794,
        "nauc_cv_recall_at_3_std": -0.202199,
        "nauc_cv_recall_at_3_diff1": -0.111765,
        "nauc_cv_recall_at_5_max": 0.105986,
        "nauc_cv_recall_at_5_std": 0.060673,
        "nauc_cv_recall_at_5_diff1": 0.10727,
        "nauc_cv_recall_at_10_max": 0.175779,
        "nauc_cv_recall_at_10_std": 0.214556,
        "nauc_cv_recall_at_10_diff1": 0.077389,
        "nauc_cv_recall_at_20_max": -0.036852,
        "nauc_cv_recall_at_20_std": 0.235453,
        "nauc_cv_recall_at_20_diff1": 0.034642,
        "nauc_cv_recall_at_100_max": NaN,
        "nauc_cv_recall_at_100_std": NaN,
        "nauc_cv_recall_at_100_diff1": NaN,
        "nauc_cv_recall_at_1000_max": NaN,
        "nauc_cv_recall_at_1000_std": NaN,
        "nauc_cv_recall_at_1000_diff1": NaN,
        "nauc_mrr_at_1_max": -0.056226,
        "nauc_mrr_at_1_std": -0.150482,
        "nauc_mrr_at_1_diff1": -0.089256,
        "nauc_mrr_at_3_max": -0.083367,
        "nauc_mrr_at_3_std": -0.170373,
        "nauc_mrr_at_3_diff1": -0.097913,
        "nauc_mrr_at_5_max": -0.013166,
        "nauc_mrr_at_5_std": -0.085539,
        "nauc_mrr_at_5_diff1": -0.035348,
        "nauc_mrr_at_10_max": 0.006031,
        "nauc_mrr_at_10_std": -0.024829,
        "nauc_mrr_at_10_diff1": -0.038193,
        "nauc_mrr_at_20_max": -0.024821,
        "nauc_mrr_at_20_std": -0.016865,
        "nauc_mrr_at_20_diff1": -0.04111,
        "nauc_mrr_at_100_max": -0.026922,
        "nauc_mrr_at_100_std": -0.040566,
        "nauc_mrr_at_100_diff1": -0.041791,
        "nauc_mrr_at_1000_max": -0.026922,
        "nauc_mrr_at_1000_std": -0.040566,
        "nauc_mrr_at_1000_diff1": -0.041791,
        "main_score": 0.12903,
        "hf_subset": "default",
        "languages": [
          "af"
        ]
      }
    ]
  },
  "evaluation_time": 24.952935695648193,
  "kg_co2_emissions": null
}

isaac-chung · 2025-07-29T14:34:48Z

The MTEB version shows that you might not have the most up to date fixed CLAP model. But at least it runs 👍 Please make sure all tests passed before you request a review.

isaac-chung · 2025-08-01T17:32:53Z

Tests are still failing. Could you see what it is?

isaac-chung

Nice work!

hepengfe · 2025-08-02T06:31:35Z

@isaac-chung Should I merge now?

hepengfe added 9 commits June 29, 2025 22:52

add the script to process commonvoice data

25f1df3

add script to upload common voice data

a136fe2

add dev data folder which was missing. Supress error from tarfile

c08ec09

Merge branch 'maeb' into common_voice

f7c303b

Merge branch 'maeb' into common_voice

60430d8

Add 'Speech Retrieval' for common voice T2A task

7484010

add common voice 17 for temporary review

156c729

add import common voice script in init file

8f2cfab

add a2t and t2a data transformation

8e89477

hepengfe marked this pull request as ready for review July 26, 2025 18:38

hepengfe requested review from KennethEnevoldsen and isaac-chung July 26, 2025 18:38

isaac-chung reviewed Jul 26, 2025

View reviewed changes

hepengfe added 2 commits July 28, 2025 23:38

fixed class name, superclass and eval languages

fcc8eab

fixed linting errors and a tar file decompression error

1e7ac89

hepengfe requested a review from isaac-chung July 29, 2025 06:50

hepengfe added 3 commits July 29, 2025 21:26

ruff reformat

0c6c8c8

add common voice 21

55983ca

ruff reformat

fbd3641

hepengfe added 4 commits August 1, 2025 22:02

fixed the citation of task metadata

d93c8d0

Merge branch 'maeb' into common_voice

b090d96

ruff format

0f89fe3

fixed language code

8976adc

isaac-chung approved these changes Aug 2, 2025

View reviewed changes

isaac-chung merged commit 54561ed into embeddings-benchmark:maeb Aug 2, 2025
8 checks passed

-class CommonVoiceA2TRetrieval(AbsTaskAny2AnyRetrieval):
+from mteb.abstasks.MultilingualTask import MultilingualTask
+class CommonVoice17A2TRetrieval(AbsTaskAny2AnyRetrieval, MultilingualTask):

	name="CommonVoiceA2TRetrieval",
	name="CommonVoice17A2TRetrieval",

Conversation

hepengfe commented Jul 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

isaac-chung left a comment

Choose a reason for hiding this comment

Uh oh!

isaac-chung Jul 26, 2025

Choose a reason for hiding this comment

Uh oh!

hepengfe Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

isaac-chung Jul 26, 2025

Choose a reason for hiding this comment

Uh oh!

hepengfe Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

isaac-chung Jul 26, 2025

Choose a reason for hiding this comment

Uh oh!

hepengfe Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

hepengfe commented Jul 29, 2025

Uh oh!

hepengfe commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

isaac-chung commented Jul 29, 2025

Uh oh!

isaac-chung commented Aug 1, 2025

Uh oh!

isaac-chung left a comment

Choose a reason for hiding this comment

Uh oh!

hepengfe commented Aug 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

hepengfe commented Jul 26, 2025 •

edited

Loading

hepengfe commented Jul 29, 2025 •

edited

Loading