[MAEB] Fix whisper model audio inference by isaac-chung · Pull Request #2954 · embeddings-benchmark/mteb

isaac-chung · 2025-07-28T15:19:22Z

Cast batch to numpy
Pad with max_length instead of longest

mteb run -m openai/whisper-small -t NMSQAPairClassification yields:

NMSQA Results

{'default': {'similarity_accuracy': 0.5321637426900585, 'similarity_accuracy_threshold': 0.8944698572158813, 'similarity_f1': 0.6747967479674797, 'similarity_f1_threshold': 0.8944698572158813, 'similarity_precision': 0.515527950310559, 'similarity_recall': 0.9764705882352941, 'similarity_ap': 0.45288448359423344, 'cosine_accuracy': 0.5321637426900585, 'cosine_accuracy_threshold': 0.8944698572158813, 'cosine_f1': 0.6747967479674797, 'cosine_f1_threshold': 0.8944698572158813, 'cosine_precision': 0.515527950310559, 'cosine_recall': 0.9764705882352941, 'cosine_ap': 0.45288448359423344, 'manhattan_accuracy': 0.5321637426900585, 'manhattan_accuracy_threshold': 178.03515625, 'manhattan_f1': 0.6747967479674797, 'manhattan_f1_threshold': 178.03515625, 'manhattan_precision': 0.515527950310559, 'manhattan_recall': 0.9764705882352941, 'manhattan_ap': 0.4534237828262119, 'euclidean_accuracy': 0.5263157894736842, 'euclidean_accuracy_threshold': 12.51692008972168, 'euclidean_f1': 0.6719367588932806, 'euclidean_f1_threshold': 15.496805191040039, 'euclidean_precision': 0.5059523809523809, 'euclidean_recall': 1.0, 'euclidean_ap': 0.45160470174489403, 'dot_accuracy': 0.52046783625731, 'dot_accuracy_threshold': 720.27099609375, 'dot_f1': 0.664, 'dot_f1_threshold': 456.8892822265625, 'dot_precision': 0.503030303030303, 'dot_recall': 0.9764705882352941, 'dot_ap': 0.46811587737626137, 'max_accuracy': 0.5321637426900585, 'max_f1': 0.6747967479674797, 'max_precision': 0.515527950310559, 'max_recall': 1.0, 'max_ap': 0.46811587737626137, 'main_score': 0.46811587737626137}}

mteb run -m openai/whisper-small -t BeijingOpera yields

BeijingOpera Results

{'default': {'accuracy': np.float64(0.8812943262411348), 'f1': np.float64(0.8841246453772911), 'f1_weighted': np.float64(0.8779403228788188), 'scores_per_experiment': [{'accuracy': 0.8958333333333334, 'f1': 0.9043417366946779, 'f1_weighted': 0.8959500466853408}, {'accuracy': 0.9574468085106383, 'f1': 0.9571439027960768, 'f1_weighted': 0.9563691182562597}, {'accuracy': 0.8723404255319149, 'f1': 0.876717032967033, 'f1_weighted': 0.8678688332943653}, {'accuracy': 0.9361702127659575, 'f1': 0.9344320486815415, 'f1_weighted': 0.9366449441111734}, {'accuracy': 0.7446808510638298, 'f1': 0.7479885057471264, 'f1_weighted': 0.7328686720469553}], 'main_score': np.float64(0.8812943262411348)}}

If you add a model or a dataset, please add the corresponding checklist:

cast batch to numpy and pad max_length

f82912c

isaac-chung changed the base branch from main to maeb July 28, 2025 15:19

trigger CI

1db9588

isaac-chung requested a review from gowitheflow-1998 July 28, 2025 15:20

isaac-chung changed the title ~~[MAEB] Fix whisper~~ [MAEB] Fix whisper model audio inference Jul 28, 2025

isaac-chung merged commit b875aa2 into maeb Jul 30, 2025
9 checks passed

isaac-chung deleted the fix-whisper branch July 30, 2025 06:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MAEB] Fix whisper model audio inference#2954

[MAEB] Fix whisper model audio inference#2954
isaac-chung merged 2 commits intomaebfrom
fix-whisper

isaac-chung commented Jul 28, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

isaac-chung commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

isaac-chung commented Jul 28, 2025 •

edited

Loading