Skip to content

Added the IEMOCAP Datasets#2640

Merged
isaac-chung merged 4 commits intoembeddings-benchmark:maebfrom
AdnanElAssadi56:maeb-dataset-iemocap
May 9, 2025
Merged

Added the IEMOCAP Datasets#2640
isaac-chung merged 4 commits intoembeddings-benchmark:maebfrom
AdnanElAssadi56:maeb-dataset-iemocap

Conversation

@AdnanElAssadi56
Copy link
Contributor

Code Quality

  • Code Formatted: Format the code using make lint to maintain consistent style.

Documentation

  • Updated Documentation: Add or update documentation to reflect the changes introduced in this PR.

Testing

  • New Tests Added: Write tests to cover new functionality. Validate with make test-with-coverage.
  • Tests Passed: Run tests locally using make test or make test-with-coverage to ensure no existing functionality is broken.

Adding datasets checklist

Reason for dataset addition: ...

  • I have run the following models on the task (adding the results to the pr). These can be run using the mteb -m {model_name} -t {task_name} command.
    • sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
    • intfloat/multilingual-e5-small
  • I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
  • If the dataset is too big (e.g. >2048 examples), considering using self.stratified_subsampling() under dataset_transform()
  • I have filled out the metadata object in the dataset file (find documentation on it here).
  • Run tests locally to make sure nothing is broken using make test.
  • Run the formatter to format the code using make lint.

Adding a model checklist

  • I have filled out the ModelMeta object to the extent possible
  • I have ensured that my model can be loaded using
    • mteb.get_model(model_name, revision) and
    • mteb.get_model_meta(model_name, revision)
  • I have tested the implementation works on a representative set of tasks.

@KennethEnevoldsen
Copy link
Contributor

See comment on: #2641

@AdnanElAssadi56
Copy link
Contributor Author

AdnanElAssadi56 commented May 4, 2025

Classification Task Results:

Model: "facebook/wav2vec2-base-960h"

{
  "dataset_revision": "9f1696a135a65ce997d898d4121c952269a822ca",
  "task_name": "**IEMOCAPEmotion**",
  "mteb_version": "1.38.4",
  "scores": {
    "train": [
      {
        "accuracy": 0.153203,
        "f1": 0.115171,
        "f1_weighted": 0.177438,
        "scores_per_experiment": [
          {
            "accuracy": 0.173805,
            "f1": 0.116921,
            "f1_weighted": 0.198447
          },
          {
            "accuracy": 0.138446,
            "f1": 0.107584,
            "f1_weighted": 0.160962
          },
          {
            "accuracy": 0.163845,
            "f1": 0.12959,
            "f1_weighted": 0.18945
          },
          {
            "accuracy": 0.128486,
            "f1": 0.106024,
            "f1_weighted": 0.155944
          },
          {
            "accuracy": 0.161435,
            "f1": 0.115735,
            "f1_weighted": 0.182385
          }
        ],
        "main_score": 0.153203,
        "hf_subset": "default",
        "languages": [
          "eng-Latn"
        ]
      }
    ]
  },
  "evaluation_time": 399.48152470588684,
  "kg_co2_emissions": null 
  {
  "dataset_revision": "9f1696a135a65ce997d898d4121c952269a822ca",
  "task_name": "**IEMOCAPGender**",
  "mteb_version": "1.38.4",
  "scores": {
    "train": [
      {
        "accuracy": 0.512799,
        "f1": 0.509615,
        "f1_weighted": 0.51132,
        "ap": 0.527537,
        "ap_weighted": 0.527537,
        "scores_per_experiment": [
          {
            "accuracy": 0.509462,
            "f1": 0.509009,
            "f1_weighted": 0.509782,
            "ap": 0.530592,
            "ap_weighted": 0.530592
          },
          {
            "accuracy": 0.493028,
            "f1": 0.492943,
            "f1_weighted": 0.493342,
            "ap": 0.527456,
            "ap_weighted": 0.527456
          },
          {
            "accuracy": 0.540837,
            "f1": 0.536048,
            "f1_weighted": 0.538959,
            "ap": 0.550209,
            "ap_weighted": 0.550209
          },
          {
            "accuracy": 0.518426,
            "f1": 0.518296,
            "f1_weighted": 0.518414,
            "ap": 0.502022,
            "ap_weighted": 0.502022
          },
          {
            "accuracy": 0.502242,
            "f1": 0.491777,
            "f1_weighted": 0.496101,
            "ap": 0.527408,
            "ap_weighted": 0.527408
          }
        ],
        "main_score": 0.512799,
        "hf_subset": "default",
        "languages": [
          "eng-Latn"
        ]
      }
    ]
  },
  "evaluation_time": 383.3181974887848,
  "kg_co2_emissions": null
}

Model: "microsoft/wavlm-base"

  {
  "dataset_revision": "9f1696a135a65ce997d898d4121c952269a822ca",
  "task_name": "**IEMOCAPEmotion**",
  "mteb_version": "1.38.4",
  "scores": {
    "train": [
      {
        "accuracy": 0.213966,
        "f1": 0.162695,
        "f1_weighted": 0.234845,
        "scores_per_experiment": [
          {
            "accuracy": 0.188247,
            "f1": 0.138977,
            "f1_weighted": 0.210447
          },
          {
            "accuracy": 0.234064,
            "f1": 0.185028,
            "f1_weighted": 0.247368
          },
          {
            "accuracy": 0.204681,
            "f1": 0.17268,
            "f1_weighted": 0.22643
          },
          {
            "accuracy": 0.223606,
            "f1": 0.161787,
            "f1_weighted": 0.248
          },
          {
            "accuracy": 0.219233,
            "f1": 0.155002,
            "f1_weighted": 0.241978
          }
        ],
        "main_score": 0.213966,
        "hf_subset": "default",
        "languages": [
          "eng-Latn"
        ]
      }
    ]
  },
  "evaluation_time": 442.93038296699524,
  "kg_co2_emissions": null
}
{
  "dataset_revision": "9f1696a135a65ce997d898d4121c952269a822ca",
  "task_name": "**IEMOCAPGender**",
  "mteb_version": "1.38.4",
  "scores": {
    "train": [
      {
        "accuracy": 0.667796,
        "f1": 0.665643,
        "f1_weighted": 0.665727,
        "ap": 0.640392,
        "ap_weighted": 0.640392,
        "scores_per_experiment": [
          {
            "accuracy": 0.584163,
            "f1": 0.583935,
            "f1_weighted": 0.583431,
            "ap": 0.577186,
            "ap_weighted": 0.577186
          },
          {
            "accuracy": 0.697709,
            "f1": 0.697707,
            "f1_weighted": 0.697753,
            "ap": 0.669779,
            "ap_weighted": 0.669779
          },
          {
            "accuracy": 0.671315,
            "f1": 0.667954,
            "f1_weighted": 0.670017,
            "ap": 0.637436,
            "ap_weighted": 0.637436
          },
          {
            "accuracy": 0.717629,
            "f1": 0.713001,
            "f1_weighted": 0.713546,
            "ap": 0.662604,
            "ap_weighted": 0.662604
          },
          {
            "accuracy": 0.668161,
            "f1": 0.665619,
            "f1_weighted": 0.66389,
            "ap": 0.654956,
            "ap_weighted": 0.654956
          }
        ],
        "main_score": 0.667796,
        "hf_subset": "default",
        "languages": [
          "eng-Latn"
        ]
      }
    ]
  },
  "evaluation_time": 422.8732376098633,
  "kg_co2_emissions": null
}

@AdnanElAssadi56
Copy link
Contributor Author

Clustering Task Results:

Model: "facebook/wav2vec2-base-960h"

{
  "dataset_revision": "9f1696a135a65ce997d898d4121c952269a822ca",
  "task_name": "IEMOCAPEmotionClustering",
  "mteb_version": "1.38.4",
  "scores": {
    "train": [
      {
        "v_measure": 0.012845,
        "nmi": 0.012845,
        "ari": 0.005638,
        "cluster_accuracy": 0.155095,
        "main_score": 0.155095,
        "hf_subset": "default",
        "languages": [
          "eng-Latn"
        ]
      }
    ]
  },
  "evaluation_time": 351.6299092769623,
  "kg_co2_emissions": null
}
{
  "dataset_revision": "9f1696a135a65ce997d898d4121c952269a822ca",
  "task_name": "IEMOCAPGenderClustering",
  "mteb_version": "1.34.7",
  "scores": {
    "train": [
      {
        "v_measure": 0.000342,
        "nmi": 0.000342,
        "ari": 0.000593,
        "cluster_accuracy": 0.513398,
        "main_score": 0.513398,
        "hf_subset": "default",
        "languages": [
          "eng-Latn"
        ]
      }
    ]
  },
  "evaluation_time": 369.00222396850586,
  "kg_co2_emissions": null
 

Model: "microsoft/wavlm-base"

{
  "dataset_revision": "9f1696a135a65ce997d898d4121c952269a822ca",
  "task_name": "IEMOCAPEmotionClustering",
  "mteb_version": "1.34.7",
  "scores": {
    "train": [
      {
        "v_measure": 0.031543,
        "nmi": 0.031543,
        "ari": 0.015857,
        "cluster_accuracy": 0.166052,
        "main_score": 0.166052,
        "hf_subset": "default",
        "languages": [
          "eng-Latn"
        ]
      }
    ]
  },
  "evaluation_time": 397.0959997177124,
  "kg_co2_emissions": null
}
{
  "dataset_revision": "9f1696a135a65ce997d898d4121c952269a822ca",
  "task_name": "IEMOCAPGenderClustering",
  "mteb_version": "1.34.7",
  "scores": {
    "train": [
      {
        "v_measure": 0.000271,
        "nmi": 0.000271,
        "ari": -4.3e-05,
        "cluster_accuracy": 0.505429,
        "main_score": 0.505429,
        "hf_subset": "default",
        "languages": [
          "eng-Latn"
        ]
      }
    ]
  },
  "evaluation_time": 396.58769488334656,
  "kg_co2_emissions": null
}

@AdnanElAssadi56
Copy link
Contributor Author

@KennethEnevoldsen

We've discussed this before, but I am including all results for completeness, and for a final confirmation of what to keep/discard.

I've also noticed that the tests are giving errors related to bibtex citations? Were the tests updated because previously all tests passed, and I can still get results without problems.

@KennethEnevoldsen
Copy link
Contributor

Yep we added requirements for bibtex to be correctly formatted. This is because it caused issue with never versions of pythons (large number of warnings).

Over on main there is a script and make command for formatting bibtex:

python scripts/format_citations.py tasks

generally is seems like the two clusterings tasks should be removed and the rest can be merged.

@isaac-chung it seems like the cluster accuracy is quite high while the v-measure is low. This seems to have been introduced in MIEB, can you help me understand the rationale?

@isaac-chung
Copy link
Collaborator

@isaac-chung it seems like the cluster accuracy is quite high while the v-measure is low. This seems to have been introduced in MIEB, can you help me understand the rationale?

I recall adding accuracy score here. Based on this comment, maybe either @Jamie-Stirling or @gowitheflow-1998 can help with this better?

@KennethEnevoldsen
Copy link
Contributor

@AdnanElAssadi56 how about removing the two clustering for now. I don't think they are reasonable enough - then we can get the rest merged

@isaac-chung isaac-chung merged commit 41b4c45 into embeddings-benchmark:maeb May 9, 2025
8 checks passed
@isaac-chung isaac-chung mentioned this pull request Jun 8, 2025
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants