Add IEMOCAP dataset by diffunity · Pull Request #2380 · embeddings-benchmark/mteb

diffunity · 2025-03-16T04:29:23Z

Code Quality

[✅] Code Formatted: Format the code using make lint to maintain consistent style.

Documentation

[✅] Updated Documentation: Add or update documentation to reflect the changes introduced in this PR.

Testing

[❌] New Tests Added: Write tests to cover new functionality. Validate with make test-with-coverage.
[✅] Tests Passed: Run tests locally using make test or make test-with-coverage to ensure no existing functionality is broken.

Adding datasets checklist

Reason for dataset addition: ...

[✅] I have run the following models on the task (adding the results to the pr). These can be run using the mteb -m {model_name} -t {task_name} command.
- sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
- intfloat/multilingual-e5-small
[✅] I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
[✅] If the dataset is too big (e.g. >2048 examples), considering using self.stratified_subsampling() under dataset_transform()
[✅] I have filled out the metadata object in the dataset file (find documentation on it here).
[✅] Run tests locally to make sure nothing is broken using make test.
[✅] Run the formatter to format the code using make lint.

Adding a model checklist

I have filled out the ModelMeta object to the extent possible
I have ensured that my model can be loaded using
- mteb.get_model(model_name, revision) and
- mteb.get_model_meta(model_name, revision)
I have tested the implementation works on a representative set of tasks.

diffunity · 2025-03-16T04:31:37Z

IEMOCAP

{'accuracy': np.float64(0.12670483983349115),
 'f1': np.float64(0.09695427294093387),
 'f1_weighted': np.float64(0.14250605663702376),
 'main_score': np.float64(0.12670483983349115),
 'scores_per_experiment': [{'accuracy': 0.15587649402390438,
                            'f1': 0.10668864611462359,
                            'f1_weighted': 0.1671551654769295},
                           {'accuracy': 0.15139442231075698,
                            'f1': 0.11189726160522125,
                            'f1_weighted': 0.14830827247384432},
                           {'accuracy': 0.11553784860557768,
                            'f1': 0.09169546488499664,
                            'f1_weighted': 0.13815163727760404},
                           {'accuracy': 0.09412350597609562,
                            'f1': 0.08616618328485233,
                            'f1_weighted': 0.11858795138338912},
                           {'accuracy': 0.11659192825112108,
                            'f1': 0.08832380881497556,
                            'f1_weighted': 0.14032725657335174}]}

isaac-chung

Let's downsample this dataset. Thanks!

isaac-chung · 2025-06-03T10:38:14Z

mteb/tasks/Audio/AudioClassification/eng/IEMOCAP.py

+        }""",
+        # https://ecs.utdallas.edu/research/researchlabs/msp-lab/publications/Busso_2008_5.pdf
+        descriptive_stats={
+            "n_samples": {"train": 10039},


If the dataset is too big (e.g. >2048 examples), considering using self.stratified_subsampling() under dataset_transform()

This is checked but I don't see that completed in the PR.

isaac-chung · 2025-06-08T16:22:00Z

Duplicates #2640

diffunity requested review from Samoed and removed request for Samoed March 17, 2025 19:11

diffunity added 2 commits March 18, 2025 04:25

add IEMOCAP

854bb47

remove label mapping

7f91f64

diffunity force-pushed the add_iemocap branch from 19ee636 to 7f91f64 Compare March 17, 2025 19:28

isaac-chung added the audio Audio extension label Jun 1, 2025

isaac-chung reviewed Jun 3, 2025

View reviewed changes

isaac-chung closed this Jun 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add IEMOCAP dataset#2380

Add IEMOCAP dataset#2380
diffunity wants to merge 2 commits intoembeddings-benchmark:maebfrom
diffunity:add_iemocap

diffunity commented Mar 16, 2025

Uh oh!

diffunity commented Mar 16, 2025

Uh oh!

isaac-chung left a comment

Uh oh!

isaac-chung Jun 3, 2025

Uh oh!

isaac-chung commented Jun 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

diffunity commented Mar 16, 2025

Code Quality

Documentation

Testing

Adding datasets checklist

Adding a model checklist

Uh oh!

diffunity commented Mar 16, 2025

Uh oh!

isaac-chung left a comment

Choose a reason for hiding this comment

Uh oh!

isaac-chung Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

isaac-chung commented Jun 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants