Added the IEMOCAP Datasets#2640
Conversation
|
See comment on: #2641 |
|
Classification Task Results: Model: "facebook/wav2vec2-base-960h" {
"dataset_revision": "9f1696a135a65ce997d898d4121c952269a822ca",
"task_name": "**IEMOCAPEmotion**",
"mteb_version": "1.38.4",
"scores": {
"train": [
{
"accuracy": 0.153203,
"f1": 0.115171,
"f1_weighted": 0.177438,
"scores_per_experiment": [
{
"accuracy": 0.173805,
"f1": 0.116921,
"f1_weighted": 0.198447
},
{
"accuracy": 0.138446,
"f1": 0.107584,
"f1_weighted": 0.160962
},
{
"accuracy": 0.163845,
"f1": 0.12959,
"f1_weighted": 0.18945
},
{
"accuracy": 0.128486,
"f1": 0.106024,
"f1_weighted": 0.155944
},
{
"accuracy": 0.161435,
"f1": 0.115735,
"f1_weighted": 0.182385
}
],
"main_score": 0.153203,
"hf_subset": "default",
"languages": [
"eng-Latn"
]
}
]
},
"evaluation_time": 399.48152470588684,
"kg_co2_emissions": null {
"dataset_revision": "9f1696a135a65ce997d898d4121c952269a822ca",
"task_name": "**IEMOCAPGender**",
"mteb_version": "1.38.4",
"scores": {
"train": [
{
"accuracy": 0.512799,
"f1": 0.509615,
"f1_weighted": 0.51132,
"ap": 0.527537,
"ap_weighted": 0.527537,
"scores_per_experiment": [
{
"accuracy": 0.509462,
"f1": 0.509009,
"f1_weighted": 0.509782,
"ap": 0.530592,
"ap_weighted": 0.530592
},
{
"accuracy": 0.493028,
"f1": 0.492943,
"f1_weighted": 0.493342,
"ap": 0.527456,
"ap_weighted": 0.527456
},
{
"accuracy": 0.540837,
"f1": 0.536048,
"f1_weighted": 0.538959,
"ap": 0.550209,
"ap_weighted": 0.550209
},
{
"accuracy": 0.518426,
"f1": 0.518296,
"f1_weighted": 0.518414,
"ap": 0.502022,
"ap_weighted": 0.502022
},
{
"accuracy": 0.502242,
"f1": 0.491777,
"f1_weighted": 0.496101,
"ap": 0.527408,
"ap_weighted": 0.527408
}
],
"main_score": 0.512799,
"hf_subset": "default",
"languages": [
"eng-Latn"
]
}
]
},
"evaluation_time": 383.3181974887848,
"kg_co2_emissions": null
}Model: "microsoft/wavlm-base" {
"dataset_revision": "9f1696a135a65ce997d898d4121c952269a822ca",
"task_name": "**IEMOCAPEmotion**",
"mteb_version": "1.38.4",
"scores": {
"train": [
{
"accuracy": 0.213966,
"f1": 0.162695,
"f1_weighted": 0.234845,
"scores_per_experiment": [
{
"accuracy": 0.188247,
"f1": 0.138977,
"f1_weighted": 0.210447
},
{
"accuracy": 0.234064,
"f1": 0.185028,
"f1_weighted": 0.247368
},
{
"accuracy": 0.204681,
"f1": 0.17268,
"f1_weighted": 0.22643
},
{
"accuracy": 0.223606,
"f1": 0.161787,
"f1_weighted": 0.248
},
{
"accuracy": 0.219233,
"f1": 0.155002,
"f1_weighted": 0.241978
}
],
"main_score": 0.213966,
"hf_subset": "default",
"languages": [
"eng-Latn"
]
}
]
},
"evaluation_time": 442.93038296699524,
"kg_co2_emissions": null
}{
"dataset_revision": "9f1696a135a65ce997d898d4121c952269a822ca",
"task_name": "**IEMOCAPGender**",
"mteb_version": "1.38.4",
"scores": {
"train": [
{
"accuracy": 0.667796,
"f1": 0.665643,
"f1_weighted": 0.665727,
"ap": 0.640392,
"ap_weighted": 0.640392,
"scores_per_experiment": [
{
"accuracy": 0.584163,
"f1": 0.583935,
"f1_weighted": 0.583431,
"ap": 0.577186,
"ap_weighted": 0.577186
},
{
"accuracy": 0.697709,
"f1": 0.697707,
"f1_weighted": 0.697753,
"ap": 0.669779,
"ap_weighted": 0.669779
},
{
"accuracy": 0.671315,
"f1": 0.667954,
"f1_weighted": 0.670017,
"ap": 0.637436,
"ap_weighted": 0.637436
},
{
"accuracy": 0.717629,
"f1": 0.713001,
"f1_weighted": 0.713546,
"ap": 0.662604,
"ap_weighted": 0.662604
},
{
"accuracy": 0.668161,
"f1": 0.665619,
"f1_weighted": 0.66389,
"ap": 0.654956,
"ap_weighted": 0.654956
}
],
"main_score": 0.667796,
"hf_subset": "default",
"languages": [
"eng-Latn"
]
}
]
},
"evaluation_time": 422.8732376098633,
"kg_co2_emissions": null
} |
|
Clustering Task Results: Model: "facebook/wav2vec2-base-960h" {
"dataset_revision": "9f1696a135a65ce997d898d4121c952269a822ca",
"task_name": "IEMOCAPEmotionClustering",
"mteb_version": "1.38.4",
"scores": {
"train": [
{
"v_measure": 0.012845,
"nmi": 0.012845,
"ari": 0.005638,
"cluster_accuracy": 0.155095,
"main_score": 0.155095,
"hf_subset": "default",
"languages": [
"eng-Latn"
]
}
]
},
"evaluation_time": 351.6299092769623,
"kg_co2_emissions": null
}{
"dataset_revision": "9f1696a135a65ce997d898d4121c952269a822ca",
"task_name": "IEMOCAPGenderClustering",
"mteb_version": "1.34.7",
"scores": {
"train": [
{
"v_measure": 0.000342,
"nmi": 0.000342,
"ari": 0.000593,
"cluster_accuracy": 0.513398,
"main_score": 0.513398,
"hf_subset": "default",
"languages": [
"eng-Latn"
]
}
]
},
"evaluation_time": 369.00222396850586,
"kg_co2_emissions": null
Model: "microsoft/wavlm-base" {
"dataset_revision": "9f1696a135a65ce997d898d4121c952269a822ca",
"task_name": "IEMOCAPEmotionClustering",
"mteb_version": "1.34.7",
"scores": {
"train": [
{
"v_measure": 0.031543,
"nmi": 0.031543,
"ari": 0.015857,
"cluster_accuracy": 0.166052,
"main_score": 0.166052,
"hf_subset": "default",
"languages": [
"eng-Latn"
]
}
]
},
"evaluation_time": 397.0959997177124,
"kg_co2_emissions": null
}{
"dataset_revision": "9f1696a135a65ce997d898d4121c952269a822ca",
"task_name": "IEMOCAPGenderClustering",
"mteb_version": "1.34.7",
"scores": {
"train": [
{
"v_measure": 0.000271,
"nmi": 0.000271,
"ari": -4.3e-05,
"cluster_accuracy": 0.505429,
"main_score": 0.505429,
"hf_subset": "default",
"languages": [
"eng-Latn"
]
}
]
},
"evaluation_time": 396.58769488334656,
"kg_co2_emissions": null
} |
|
We've discussed this before, but I am including all results for completeness, and for a final confirmation of what to keep/discard. I've also noticed that the tests are giving errors related to bibtex citations? Were the tests updated because previously all tests passed, and I can still get results without problems. |
|
Yep we added requirements for bibtex to be correctly formatted. This is because it caused issue with never versions of pythons (large number of warnings). Over on main there is a script and make command for formatting bibtex:
generally is seems like the two clusterings tasks should be removed and the rest can be merged. @isaac-chung it seems like the cluster accuracy is quite high while the v-measure is low. This seems to have been introduced in MIEB, can you help me understand the rationale? |
I recall adding accuracy score here. Based on this comment, maybe either @Jamie-Stirling or @gowitheflow-1998 can help with this better? |
|
@AdnanElAssadi56 how about removing the two clustering for now. I don't think they are reasonable enough - then we can get the rest merged |
Code Quality
make lintto maintain consistent style.Documentation
Testing
make test-with-coverage.make testormake test-with-coverageto ensure no existing functionality is broken.Adding datasets checklist
Reason for dataset addition: ...
mteb -m {model_name} -t {task_name}command.sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2intfloat/multilingual-e5-smallself.stratified_subsampling() under dataset_transform()make test.make lint.Adding a model checklist
mteb.get_model(model_name, revision)andmteb.get_model_meta(model_name, revision)