Added the IEMOCAP Datasets by AdnanElAssadi56 · Pull Request #2640 · embeddings-benchmark/mteb

AdnanElAssadi56 · 2025-05-03T18:15:47Z

Code Quality

Code Formatted: Format the code using make lint to maintain consistent style.

Documentation

Updated Documentation: Add or update documentation to reflect the changes introduced in this PR.

Testing

New Tests Added: Write tests to cover new functionality. Validate with make test-with-coverage.
Tests Passed: Run tests locally using make test or make test-with-coverage to ensure no existing functionality is broken.

Adding datasets checklist

Reason for dataset addition: ...

I have run the following models on the task (adding the results to the pr). These can be run using the mteb -m {model_name} -t {task_name} command.
- sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
- intfloat/multilingual-e5-small
I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
If the dataset is too big (e.g. >2048 examples), considering using self.stratified_subsampling() under dataset_transform()
I have filled out the metadata object in the dataset file (find documentation on it here).
Run tests locally to make sure nothing is broken using make test.
Run the formatter to format the code using make lint.

Adding a model checklist

I have filled out the ModelMeta object to the extent possible
I have ensured that my model can be loaded using
- mteb.get_model(model_name, revision) and
- mteb.get_model_meta(model_name, revision)
I have tested the implementation works on a representative set of tasks.

KennethEnevoldsen · 2025-05-04T10:18:32Z

See comment on: #2641

AdnanElAssadi56 · 2025-05-04T20:01:08Z

Classification Task Results:

Model: "facebook/wav2vec2-base-960h"

{
  "dataset_revision": "9f1696a135a65ce997d898d4121c952269a822ca",
  "task_name": "**IEMOCAPEmotion**",
  "mteb_version": "1.38.4",
  "scores": {
    "train": [
      {
        "accuracy": 0.153203,
        "f1": 0.115171,
        "f1_weighted": 0.177438,
        "scores_per_experiment": [
          {
            "accuracy": 0.173805,
            "f1": 0.116921,
            "f1_weighted": 0.198447
          },
          {
            "accuracy": 0.138446,
            "f1": 0.107584,
            "f1_weighted": 0.160962
          },
          {
            "accuracy": 0.163845,
            "f1": 0.12959,
            "f1_weighted": 0.18945
          },
          {
            "accuracy": 0.128486,
            "f1": 0.106024,
            "f1_weighted": 0.155944
          },
          {
            "accuracy": 0.161435,
            "f1": 0.115735,
            "f1_weighted": 0.182385
          }
        ],
        "main_score": 0.153203,
        "hf_subset": "default",
        "languages": [
          "eng-Latn"
        ]
      }
    ]
  },
  "evaluation_time": 399.48152470588684,
  "kg_co2_emissions": null

  {
  "dataset_revision": "9f1696a135a65ce997d898d4121c952269a822ca",
  "task_name": "**IEMOCAPGender**",
  "mteb_version": "1.38.4",
  "scores": {
    "train": [
      {
        "accuracy": 0.512799,
        "f1": 0.509615,
        "f1_weighted": 0.51132,
        "ap": 0.527537,
        "ap_weighted": 0.527537,
        "scores_per_experiment": [
          {
            "accuracy": 0.509462,
            "f1": 0.509009,
            "f1_weighted": 0.509782,
            "ap": 0.530592,
            "ap_weighted": 0.530592
          },
          {
            "accuracy": 0.493028,
            "f1": 0.492943,
            "f1_weighted": 0.493342,
            "ap": 0.527456,
            "ap_weighted": 0.527456
          },
          {
            "accuracy": 0.540837,
            "f1": 0.536048,
            "f1_weighted": 0.538959,
            "ap": 0.550209,
            "ap_weighted": 0.550209
          },
          {
            "accuracy": 0.518426,
            "f1": 0.518296,
            "f1_weighted": 0.518414,
            "ap": 0.502022,
            "ap_weighted": 0.502022
          },
          {
            "accuracy": 0.502242,
            "f1": 0.491777,
            "f1_weighted": 0.496101,
            "ap": 0.527408,
            "ap_weighted": 0.527408
          }
        ],
        "main_score": 0.512799,
        "hf_subset": "default",
        "languages": [
          "eng-Latn"
        ]
      }
    ]
  },
  "evaluation_time": 383.3181974887848,
  "kg_co2_emissions": null
}

Model: "microsoft/wavlm-base"

  {
  "dataset_revision": "9f1696a135a65ce997d898d4121c952269a822ca",
  "task_name": "**IEMOCAPEmotion**",
  "mteb_version": "1.38.4",
  "scores": {
    "train": [
      {
        "accuracy": 0.213966,
        "f1": 0.162695,
        "f1_weighted": 0.234845,
        "scores_per_experiment": [
          {
            "accuracy": 0.188247,
            "f1": 0.138977,
            "f1_weighted": 0.210447
          },
          {
            "accuracy": 0.234064,
            "f1": 0.185028,
            "f1_weighted": 0.247368
          },
          {
            "accuracy": 0.204681,
            "f1": 0.17268,
            "f1_weighted": 0.22643
          },
          {
            "accuracy": 0.223606,
            "f1": 0.161787,
            "f1_weighted": 0.248
          },
          {
            "accuracy": 0.219233,
            "f1": 0.155002,
            "f1_weighted": 0.241978
          }
        ],
        "main_score": 0.213966,
        "hf_subset": "default",
        "languages": [
          "eng-Latn"
        ]
      }
    ]
  },
  "evaluation_time": 442.93038296699524,
  "kg_co2_emissions": null
}

{
  "dataset_revision": "9f1696a135a65ce997d898d4121c952269a822ca",
  "task_name": "**IEMOCAPGender**",
  "mteb_version": "1.38.4",
  "scores": {
    "train": [
      {
        "accuracy": 0.667796,
        "f1": 0.665643,
        "f1_weighted": 0.665727,
        "ap": 0.640392,
        "ap_weighted": 0.640392,
        "scores_per_experiment": [
          {
            "accuracy": 0.584163,
            "f1": 0.583935,
            "f1_weighted": 0.583431,
            "ap": 0.577186,
            "ap_weighted": 0.577186
          },
          {
            "accuracy": 0.697709,
            "f1": 0.697707,
            "f1_weighted": 0.697753,
            "ap": 0.669779,
            "ap_weighted": 0.669779
          },
          {
            "accuracy": 0.671315,
            "f1": 0.667954,
            "f1_weighted": 0.670017,
            "ap": 0.637436,
            "ap_weighted": 0.637436
          },
          {
            "accuracy": 0.717629,
            "f1": 0.713001,
            "f1_weighted": 0.713546,
            "ap": 0.662604,
            "ap_weighted": 0.662604
          },
          {
            "accuracy": 0.668161,
            "f1": 0.665619,
            "f1_weighted": 0.66389,
            "ap": 0.654956,
            "ap_weighted": 0.654956
          }
        ],
        "main_score": 0.667796,
        "hf_subset": "default",
        "languages": [
          "eng-Latn"
        ]
      }
    ]
  },
  "evaluation_time": 422.8732376098633,
  "kg_co2_emissions": null
}

AdnanElAssadi56 · 2025-05-04T21:35:05Z

Clustering Task Results:

Model: "facebook/wav2vec2-base-960h"

{
  "dataset_revision": "9f1696a135a65ce997d898d4121c952269a822ca",
  "task_name": "IEMOCAPEmotionClustering",
  "mteb_version": "1.38.4",
  "scores": {
    "train": [
      {
        "v_measure": 0.012845,
        "nmi": 0.012845,
        "ari": 0.005638,
        "cluster_accuracy": 0.155095,
        "main_score": 0.155095,
        "hf_subset": "default",
        "languages": [
          "eng-Latn"
        ]
      }
    ]
  },
  "evaluation_time": 351.6299092769623,
  "kg_co2_emissions": null
}

{
  "dataset_revision": "9f1696a135a65ce997d898d4121c952269a822ca",
  "task_name": "IEMOCAPGenderClustering",
  "mteb_version": "1.34.7",
  "scores": {
    "train": [
      {
        "v_measure": 0.000342,
        "nmi": 0.000342,
        "ari": 0.000593,
        "cluster_accuracy": 0.513398,
        "main_score": 0.513398,
        "hf_subset": "default",
        "languages": [
          "eng-Latn"
        ]
      }
    ]
  },
  "evaluation_time": 369.00222396850586,
  "kg_co2_emissions": null

Model: "microsoft/wavlm-base"

{
  "dataset_revision": "9f1696a135a65ce997d898d4121c952269a822ca",
  "task_name": "IEMOCAPEmotionClustering",
  "mteb_version": "1.34.7",
  "scores": {
    "train": [
      {
        "v_measure": 0.031543,
        "nmi": 0.031543,
        "ari": 0.015857,
        "cluster_accuracy": 0.166052,
        "main_score": 0.166052,
        "hf_subset": "default",
        "languages": [
          "eng-Latn"
        ]
      }
    ]
  },
  "evaluation_time": 397.0959997177124,
  "kg_co2_emissions": null
}

{
  "dataset_revision": "9f1696a135a65ce997d898d4121c952269a822ca",
  "task_name": "IEMOCAPGenderClustering",
  "mteb_version": "1.34.7",
  "scores": {
    "train": [
      {
        "v_measure": 0.000271,
        "nmi": 0.000271,
        "ari": -4.3e-05,
        "cluster_accuracy": 0.505429,
        "main_score": 0.505429,
        "hf_subset": "default",
        "languages": [
          "eng-Latn"
        ]
      }
    ]
  },
  "evaluation_time": 396.58769488334656,
  "kg_co2_emissions": null
}

AdnanElAssadi56 · 2025-05-04T21:37:08Z

@KennethEnevoldsen

We've discussed this before, but I am including all results for completeness, and for a final confirmation of what to keep/discard.

I've also noticed that the tests are giving errors related to bibtex citations? Were the tests updated because previously all tests passed, and I can still get results without problems.

KennethEnevoldsen · 2025-05-05T09:36:56Z

Yep we added requirements for bibtex to be correctly formatted. This is because it caused issue with never versions of pythons (large number of warnings).

Over on main there is a script and make command for formatting bibtex:

python scripts/format_citations.py tasks

generally is seems like the two clusterings tasks should be removed and the rest can be merged.

@isaac-chung it seems like the cluster accuracy is quite high while the v-measure is low. This seems to have been introduced in MIEB, can you help me understand the rationale?

isaac-chung · 2025-05-05T09:50:13Z

@isaac-chung it seems like the cluster accuracy is quite high while the v-measure is low. This seems to have been introduced in MIEB, can you help me understand the rationale?

I recall adding accuracy score here. Based on this comment, maybe either @Jamie-Stirling or @gowitheflow-1998 can help with this better?

KennethEnevoldsen · 2025-05-06T19:47:14Z

@AdnanElAssadi56 how about removing the two clustering for now. I don't think they are reasonable enough - then we can get the rest merged

AdnanElAssadi56 added 2 commits May 3, 2025 14:13

Added the IEMOCAP Datasets

6b1c208

Bibtex De-indent

5298abf

Fixed Bibtex

10ed7f5

Removed Clustering Subsets

bf5ba31

KennethEnevoldsen approved these changes May 8, 2025

View reviewed changes

isaac-chung merged commit 41b4c45 into embeddings-benchmark:maeb May 9, 2025
8 checks passed

isaac-chung mentioned this pull request Jun 8, 2025

Add IEMOCAP dataset #2380

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added the IEMOCAP Datasets#2640

Added the IEMOCAP Datasets#2640
isaac-chung merged 4 commits intoembeddings-benchmark:maebfrom
AdnanElAssadi56:maeb-dataset-iemocap

AdnanElAssadi56 commented May 3, 2025

Uh oh!

KennethEnevoldsen commented May 4, 2025

Uh oh!

AdnanElAssadi56 commented May 4, 2025 •

edited

Loading

Uh oh!

AdnanElAssadi56 commented May 4, 2025

Uh oh!

AdnanElAssadi56 commented May 4, 2025

Uh oh!

KennethEnevoldsen commented May 5, 2025

Uh oh!

isaac-chung commented May 5, 2025

Uh oh!

KennethEnevoldsen commented May 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

AdnanElAssadi56 commented May 3, 2025

Code Quality

Documentation

Testing

Adding datasets checklist

Adding a model checklist

Uh oh!

KennethEnevoldsen commented May 4, 2025

Uh oh!

AdnanElAssadi56 commented May 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AdnanElAssadi56 commented May 4, 2025

Uh oh!

AdnanElAssadi56 commented May 4, 2025

Uh oh!

KennethEnevoldsen commented May 5, 2025

Uh oh!

isaac-chung commented May 5, 2025

Uh oh!

KennethEnevoldsen commented May 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AdnanElAssadi56 commented May 4, 2025 •

edited

Loading