Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
235 commits
Select commit Hold shift + click to select a range
32b7af8
Started the following:
sufen-f Feb 19, 2025
8eff2c6
Minor changes and linted files. #2093
sufen-f Feb 19, 2025
53a2e36
Minor changes and linted files. #2093
sufen-f Feb 20, 2025
ed93f2b
Minor changes and linted files. #2093
sufen-f Feb 20, 2025
fbab033
Refs #2068: Initial Implementation of audio-text retrieval abstask an…
imadtyx Feb 20, 2025
d39e187
Added MockAudioClustering task + MockAudioEncoder for testcase
alisartazkhan Feb 20, 2025
bcca37f
MockAudioClustering + MockAudioEncoder (#2093)
Feb 20, 2025
2a238ed
Added wav2vec model wrapper
alisartazkhan Feb 22, 2025
7816974
Added subTask with small sample of dataset for testing
Feb 22, 2025
07f53b1
Added four w2v variants
alisartazkhan Feb 23, 2025
882af38
Update wav2vec_models.py
alisartazkhan Feb 23, 2025
daeada0
Added wav2vec (5), wavlm (7), and whisper (5) models
alisartazkhan Feb 23, 2025
c1ebf2a
Added revisions from HF to wav2vec models, added silhouette score, DB…
sufen-f Feb 23, 2025
716deed
Update mteb/models/wavlm_models.py
alisartazkhan Feb 23, 2025
ce1bee9
setting up colab
sufen-f Feb 24, 2025
4cf7e6f
Merge remote-tracking branch 'origin/maeb' into maeb
sufen-f Feb 24, 2025
545b938
added a2a
Feb 24, 2025
ed978fa
PCA + hidden layer + shuffling
Feb 24, 2025
1616ba9
New task: emotion clustering
Feb 24, 2025
ac14d16
Added qwen2 model
alisartazkhan Feb 26, 2025
1302477
Added Wav2Vec model, voice clustering task, VoxCeleb dataset subset (…
sufen-f Feb 28, 2025
4f23fdf
Merge branch 'maeb' into maeb
sufen-f Feb 28, 2025
ee10191
Revert "Maeb - added voice clustering task, wav2vec model and VoxCele…
sufen-f Mar 1, 2025
f1449c0
Revert "Revert "Maeb - added voice clustering task, wav2vec model and…
sufen-f Mar 1, 2025
d731d40
Revert "Revert "Revert "Maeb - added voice clustering task, wav2vec m…
sufen-f Mar 1, 2025
a0de4fc
Add Audio (Multi Label) Classification Abstask, Baseline Audio model,…
anime-sh Mar 4, 2025
0620c58
Add ESC50 and zero-shot classification (#2133)
RahulSChand Mar 5, 2025
6d9eca3
Add unfused clap model for zero-shot (#2269)
RahulSChand Mar 6, 2025
2188585
Add new and complete version of FSD50K multi-label audio classificati…
RahulSChand Mar 8, 2025
bdefb14
added large, music and speech clap models (#2284)
RahulSChand Mar 8, 2025
2e5dc67
add AbsTaskAudioClassification, ESC50 & GunshotTriangulation datasets…
silky1708 Mar 10, 2025
bf9fe16
Add NSynth dataset (#2306)
silky1708 Mar 10, 2025
a94ea50
Add urbansound8k for zero-shot (#2292)
RahulSChand Mar 11, 2025
52a88ae
Add Emotion classification Ravdess dataset (#2320)
RahulSChand Mar 11, 2025
cd07f24
[MAEB] main merge (#2341)
isaac-chung Mar 13, 2025
ef30e3d
adding GTZAN Genre dataset (#2307)
silky1708 Mar 13, 2025
5cf3840
Adding Beijing Opera dataset (#2356)
silky1708 Mar 14, 2025
368e720
update TaskMetadata from mteb:maeb
silky1708 Mar 13, 2025
25136ba
make pr
silky1708 Mar 13, 2025
79e06fe
update ruff to 0.9.7; make lint
silky1708 Mar 13, 2025
f85627f
update TaskMetadata from mteb:maeb
silky1708 Mar 13, 2025
0cf07f4
update TaskMetadata
silky1708 Mar 14, 2025
7460a13
add Mridingham datasets
silky1708 Mar 14, 2025
d5caae6
rm comment
silky1708 Mar 14, 2025
187d7bc
Adding Libricount dataset (#2361)
silky1708 Mar 16, 2025
3bae6b6
Adding Crema-D Dataset for emotion classification [HEAR] (#2368)
silky1708 Mar 16, 2025
307aa57
Adding FSDD dataset (Free Spoken Digit Dataset) (#2371)
silky1708 Mar 16, 2025
6ad0bc2
Add VoxCelebSA, SpokenQAforIC, VehicleSoundClustering from Dynamic-SU…
diffunity Mar 17, 2025
230064a
fix FSD-50K Task Metadata, Label handling and add stratified subsampl…
anime-sh Mar 18, 2025
89ab596
Add music clustering dataset (#2232)
mina-parham Mar 26, 2025
f3a0403
[MAEB] merge main -> maeb (#2471)
isaac-chung Apr 1, 2025
5af86e5
Create AbsTask and Evaluator for audio pair classification task (#2457)
switchpiggy Apr 4, 2025
01c462d
Add Language, Gender, and Age classifcation tasks based on common-la…
anime-sh Apr 4, 2025
5acab7f
Merge main into MAEB (#2488)
isaac-chung Apr 4, 2025
31925c5
added wavlm models (#2472)
alisartazkhan Apr 4, 2025
7e57e9d
Adding SIB-FLEURS (#2357)
diffunity Apr 5, 2025
991a0fc
update wavlm models
alisartazkhan Apr 22, 2025
5fc6e4d
update wavlm models
alisartazkhan Apr 23, 2025
14f6b41
Add files via upload
mnasser3 Apr 29, 2025
9eaca21
Update whisper_models.py license format
mnasser3 Apr 29, 2025
040d5c6
Updated wavlm and whisper models to fit maeb structure (#2572)
alisartazkhan May 2, 2025
aba957c
Delete mteb/abstasks/Image/AbsTaskZeroshotClassification.py
isaac-chung May 3, 2025
2fada5b
[MAEB] Merge in main 20250503 (#2635)
isaac-chung May 3, 2025
4c53823
Added SpeechCommands Dataset (Subset) (#2645)
AdnanElAssadi56 May 6, 2025
804be31
Added ESC50 Clustering Dataset (#2652)
AdnanElAssadi56 May 7, 2025
e1bc62f
Added Qwen2-7b (#2660)
alisartazkhan May 8, 2025
41b4c45
Added the IEMOCAP Datasets (#2640)
AdnanElAssadi56 May 9, 2025
4cd81ce
Add sew-d and unispeech models
sufen-f May 17, 2025
1163e62
Add sew-d and unispeech models
sufen-f May 17, 2025
cef8d57
Merge branch 'model_development' into maeb
sufen-f May 17, 2025
2d25266
Revert "Merge branch 'model_development' into maeb"
sufen-f May 17, 2025
0fb74db
Reapply "Merge branch 'model_development' into maeb"
sufen-f May 17, 2025
a2e6cf2
Revert to 41b4c451d48ca1234b508a5972662dc0c25573fa
sufen-f May 17, 2025
390b867
Add sew-d and unispeech models #2693 #2694 (#2701)
sufen-f May 18, 2025
6f15209
Added Minds14 Dataset (#2644)
AdnanElAssadi56 May 19, 2025
17197e0
Added Hubert Models (#2689)
AdnanElAssadi56 May 23, 2025
ee8e26f
Added AST Model (#2691)
AdnanElAssadi56 May 23, 2025
95a03f7
Added Data2Vec Models (#2690)
AdnanElAssadi56 May 23, 2025
645255b
Adding BirdSet dataset
imadtyx Jun 1, 2025
e067d88
Update __init__.py to include BirdSet dataset(s)
imadtyx Jun 1, 2025
1afb4ac
MAEB: Encodec Model (#2754)
AdnanElAssadi56 Jun 2, 2025
d4b9abd
MAEB: MMS Models (#2750)
AdnanElAssadi56 Jun 2, 2025
cf51d8f
MAEB: Seamlessm4t Model (V2) (#2751)
AdnanElAssadi56 Jun 2, 2025
439ee37
[MAEB] CNN14 Model (PANNs) (#2757)
AdnanElAssadi56 Jun 2, 2025
6e434aa
Added TutAcoustic Scenes Dataset (#2647)
AdnanElAssadi56 Jun 3, 2025
88436e3
MAEB: M-CTC-T Model (#2753)
AdnanElAssadi56 Jun 3, 2025
c5d8484
Added GTZAN Clustering Dataset (#2653)
AdnanElAssadi56 Jun 3, 2025
1af8eb1
Added AmbientAcousticContext Dataset (#2642)
AdnanElAssadi56 Jun 3, 2025
69d67e4
Added Crema_d Dataset (#2651)
AdnanElAssadi56 Jun 3, 2025
cd7c6e9
Added VoxCeleb Clustering Dataset (#2654)
AdnanElAssadi56 Jun 3, 2025
eb173b9
Audio Reranking Abstask+ Evaluator + Mini/Dummy AudioCaps Subset (#2744)
AdnanElAssadi56 Jun 5, 2025
31f38f2
Added 5 datasets for audio pair classification (#2463)
kkaitlyn111 Jun 8, 2025
ece46da
Adds SpokeN-100-English (#2342)
mina-parham Jun 8, 2025
89563e1
Adds VocalSound dataset (#2337)
mina-parham Jun 8, 2025
9114dc6
Added Birdclef Subset Dataset (#2641)
AdnanElAssadi56 Jun 13, 2025
c383316
Merge branch 'maeb' of github.com:embeddings-benchmark/mteb into maeb
isaac-chung Jun 14, 2025
a81eec3
lint
isaac-chung Jun 14, 2025
e990850
Added VoxPopuli Datasets (#2648)
AdnanElAssadi56 Jun 20, 2025
6bc4c5a
added SpeechCommand dataset and Keyword spotting task (#2329)
RahulSChand Jun 21, 2025
bdbe51f
[MAEB] Merge from main up to 1.38.30 (#2840)
isaac-chung Jun 22, 2025
5510897
Added Yamnet and VGGish models (#2687)
ayush1298 Jun 23, 2025
3c464f9
Add urbansound 8k linear probing (#2845)
isaac-chung Jun 23, 2025
a4842d5
add stratified_subsampling to Audio clustering datasets (#2854)
isaac-chung Jun 28, 2025
1453ad6
Audio Reranking Eval Update + 5 Reranking Datasets (#2849)
AdnanElAssadi56 Jun 28, 2025
73c9d2c
[MAEB] Sync with 1.38.33 (#2883)
isaac-chung Jul 6, 2025
8a8a101
MAEB Classification Datasets Downsampling/Formatting + MTEB UPLOAD (#…
AdnanElAssadi56 Jul 9, 2025
c7b8542
Merge main maeb 07 10 (#2894)
Samoed Jul 10, 2025
74bdc03
merge main
Samoed Jul 10, 2025
8f8577f
SibFluers Dataset Multilingual Extention (#2890)
AdnanElAssadi56 Jul 11, 2025
f1eb63c
Implemented Audio Any2AnyRetrieval + 3 Datasets for A2A, A2T, T2A (#2…
kkaitlyn111 Jul 12, 2025
ab0899c
[MAEB] encode() for audio-only models should raise error (#2914)
isaac-chung Jul 18, 2025
f619034
fix: add missing clap model handling
isaac-chung Jul 18, 2025
4e79b1a
dataset: add Clotho by creating the datasets on the fly (#2915)
isaac-chung Jul 20, 2025
6b37b71
dataset: Add SoundDescs (#2911)
isaac-chung Jul 20, 2025
a19e7b4
Audio Retrieval Dataset: UrbanSound8K (#2920)
AdnanElAssadi56 Jul 21, 2025
698500d
Audio Retrieval Dataset: MACS (#2921)
AdnanElAssadi56 Jul 21, 2025
ca4b73c
SpeechT5 Model (#2901)
AdnanElAssadi56 Jul 21, 2025
6671fcc
MAEB Model MSCLAP (#2902)
AdnanElAssadi56 Jul 21, 2025
dd6a76a
MAEB Model Wav2Clip (#2908)
AdnanElAssadi56 Jul 21, 2025
7e1fb93
Audio Retrieval Dataset: EmoVDB (#2923)
AdnanElAssadi56 Jul 21, 2025
48febd1
MAEB Model MuQ-MuLan (#2909)
AdnanElAssadi56 Jul 21, 2025
7801759
fix encode() in audio models (#2926)
isaac-chung Jul 21, 2025
7a4be45
Audio Retrieval Dataset: HiFiTTS (#2924)
AdnanElAssadi56 Jul 21, 2025
8a01d4e
Audio Retrieval Dataset: MusicCaps (#2918)
AdnanElAssadi56 Jul 21, 2025
53071b3
Audio Retrieval Dataset: CMU-Arctic (#2929)
AdnanElAssadi56 Jul 23, 2025
b087dfe
Audio Models Batch Fix (#2932)
AdnanElAssadi56 Jul 23, 2025
aadd51e
Add AudioSet and AudioSetMini (#2952)
isaac-chung Jul 28, 2025
b875aa2
[MAEB] Fix whisper model audio inference (#2954)
isaac-chung Jul 30, 2025
54561ed
Common voice (#2951)
hepengfe Aug 2, 2025
d841b33
fleurs retrieval tasks (#2976)
hepengfe Aug 4, 2025
069b294
MAEB Model Evaluation Fixes (#2956)
AdnanElAssadi56 Aug 5, 2025
671be23
Fix ClothoA2T modality (#2988)
isaac-chung Aug 5, 2025
49528b6
Revert "MAEB Model Evaluation Fixes" (#2993)
isaac-chung Aug 6, 2025
c7278c9
Audio Retrieval Dataset: AudioSet-Strong (#2931)
AdnanElAssadi56 Aug 9, 2025
5b827d9
Audio Retrieval Dataset: GigaSpeech (#2925)
AdnanElAssadi56 Aug 20, 2025
017c2be
Audio Retrieval Dataset: LibriTTS (#2917)
AdnanElAssadi56 Aug 25, 2025
4b992c9
Maeb main merge 26 08 (#3076)
Samoed Aug 26, 2025
21c2fce
Spoken SQuAD - MAEB (#3074)
arteemg Aug 26, 2025
53b8b62
Main merge for maeb -> 1.38.52 (#3109)
isaac-chung Sep 1, 2025
4c06b59
Fix VocalSound split naming (#3108)
isaac-chung Sep 1, 2025
0f64441
Audio Retrieval Dataset: JLCorpus (#2927)
AdnanElAssadi56 Sep 3, 2025
81b621a
MAEB Models Eval Fixes 2 (#3010)
AdnanElAssadi56 Sep 12, 2025
7bca59f
MAEB Models Eval Fixes 3 (#3184)
AdnanElAssadi56 Oct 19, 2025
9f1c7a6
Maeb merge main v2 (#3447)
Samoed Oct 22, 2025
3f83aed
Merge branch 'main' into maeb
Samoed Oct 22, 2025
e80affe
Merge branch 'main' into maeb
Samoed Oct 22, 2025
96e3631
clenup
Samoed Oct 22, 2025
dad573a
make maeb tasks importable (#3496)
Samoed Oct 26, 2025
437eb78
Refactor tasks and models to new interface (#3497)
Samoed Nov 11, 2025
93fc653
[MAEB] Merge with `AbsRetrieval` (#3528)
Samoed Nov 13, 2025
603f3e8
Merge branch 'main' into maeb
Samoed Nov 16, 2025
a6026d2
[MAEB] merge zeroshot classification (#3580)
Samoed Nov 19, 2025
54c06a2
[MAEB] merge clustering (#3582)
Samoed Nov 19, 2025
eea0678
[MAEB] Merge pair classification (#3577)
Samoed Nov 19, 2025
3a79938
[MAEB] Merge `AudioReranking` with `Retrieval` (#3570)
Samoed Nov 20, 2025
3141664
[MAEB] Make `Qwen2-Audio` support text (#3581)
Samoed Nov 24, 2025
9f9f865
[MAEB] Merge classification (#3590)
Samoed Nov 25, 2025
892a6bf
[MAEB] Merge multilabel classification (#3614)
Samoed Nov 25, 2025
21624a6
[MAEB] CLAP Token Length Error from Fleurs (#3710)
AdnanElAssadi56 Dec 22, 2025
60fa898
Merge branch 'main' into maeb
Samoed Dec 22, 2025
f532b54
fix task
Samoed Dec 23, 2025
fc753ff
fix task
Samoed Dec 23, 2025
1c68b37
Add torch.no_grad to speecht5 (#3862)
AdnanElAssadi56 Jan 5, 2026
8e22fc3
Add proper text batching to Muq Model (#3856)
AdnanElAssadi56 Jan 5, 2026
ae69880
Update music_caps dataset path (#3854)
AdnanElAssadi56 Jan 5, 2026
6589273
Fix Sewd Model revision (#3857)
AdnanElAssadi56 Jan 6, 2026
0b03b14
Add Safey check for vggish (#3850)
AdnanElAssadi56 Jan 6, 2026
a36139f
Fix sound_desc dataset name typo (#3859)
AdnanElAssadi56 Jan 6, 2026
5e0318e
[MAEB] Ensure that Yamnet handles empty audio snippets (#3846)
AdnanElAssadi56 Jan 6, 2026
a5abf11
[MAEB] Add AST Model minimum length (#3861)
AdnanElAssadi56 Jan 7, 2026
033cf3d
fix: Add text modality to audio zeroshot classification tasks (#3878)
isaac-chung Jan 7, 2026
f719ed3
[MAEB] merge from main again (#3873)
isaac-chung Jan 7, 2026
13a0f6c
[MAEB] init imports fix (#3845)
AdnanElAssadi56 Jan 7, 2026
c057e08
[MAEB] Commonvoice Task Fix (#3847)
AdnanElAssadi56 Jan 7, 2026
5ee0efe
[MAEB] fix speechcommands column name (#3853)
AdnanElAssadi56 Jan 7, 2026
f00fadf
Fix label column name + Filter out empty audio samples (#3860)
AdnanElAssadi56 Jan 7, 2026
83c77bc
Use zxx-Zxxx language code for non-human audio tasks (#3880)
isaac-chung Jan 7, 2026
a5c3751
[MAEB] Model Meta Revision Fix (#3868)
AdnanElAssadi56 Jan 7, 2026
4f4792d
Merge branch 'main' into maeb
Samoed Jan 7, 2026
9e620b7
[MAEB] Add mctct safety check (#3848)
AdnanElAssadi56 Jan 7, 2026
782ce45
[MAEB] Add proper wav2clip text encoder (#3849)
AdnanElAssadi56 Jan 7, 2026
2f335aa
[MAEB] Make seemlessm45 model more efficient (#3851)
AdnanElAssadi56 Jan 7, 2026
e26db1d
[MAEB] Add monkey_patching for cnn14 to handle torchaudio versions (#…
AdnanElAssadi56 Jan 7, 2026
25e6c23
[MAEB] Use new subsampled version of fsd50_mini (#3865)
AdnanElAssadi56 Jan 7, 2026
357ac90
[MAEB] Remove AudioCapsMiniReranking Task (#3883)
AdnanElAssadi56 Jan 7, 2026
87cf288
fix name of evaluator for multilabel tasks (#3882)
Samoed Jan 7, 2026
f634317
Update audio datasets card generation (#3884)
Samoed Jan 8, 2026
fcb5e77
[MAEB] Add Msclap Workaround using temp files to match official model…
AdnanElAssadi56 Jan 8, 2026
8477591
[MAEB] Add safety checks for encodec model (#3858)
AdnanElAssadi56 Jan 8, 2026
3d17dbc
Audio statistics (#3833)
Samoed Jan 8, 2026
96fdf25
Remove maeb results folder (sym link) (#3898)
Samoed Jan 9, 2026
7645df8
[MAEB] Move non-linguistic audio tasks from eng to zxx folders (#3900)
isaac-chung Jan 9, 2026
edf141a
Add GoogleSVQ retrieval task with all 26 locales (#3907)
isaac-chung Jan 10, 2026
f5856e0
Task type consistency (#3955)
isaac-chung Jan 17, 2026
d21d709
Add new dataset Expresso, Globe V2 (age + gender) (#3904)
diffunity Jan 18, 2026
2ab715e
Filter out empty samples in voice_gender_clustering (#3864)
AdnanElAssadi56 Jan 18, 2026
cdc1c37
add tau 2022 mobile development set (#3966)
diffunity Jan 19, 2026
4d6d411
fix category
Samoed Jan 19, 2026
e71b488
Add LASS and CSTR-VCTK (accents + gender) datasets (#3942)
diffunity Jan 20, 2026
4a1e4c3
Compute statistics (#3903)
Samoed Jan 20, 2026
10d8f37
Add mini to common voice tasks (#3964)
Samoed Jan 21, 2026
2a98d8d
Reupload tasks with trust remote code (#3961)
Samoed Jan 21, 2026
d51d19a
[MAEB] Msclap Eval Error Fix (#3943)
AdnanElAssadi56 Jan 21, 2026
eca03ed
[MAEB] Filtered VoxPopuliAccentPairCls Dataset (#3976)
AdnanElAssadi56 Jan 21, 2026
1e90c1a
[MAEB] Speedup SVQ, Common voice, Clotho, Fleurs datasets loading (#…
Samoed Jan 21, 2026
566a6d9
[MAEB] Find tasks with empty samples (#3978)
Samoed Jan 22, 2026
b5f9d25
[MAEB] adding`LCO` models (#3963)
gowitheflow-1998 Jan 22, 2026
291eedc
add urban sound
Samoed Jan 22, 2026
2b2a83f
[MAEB] add models citations (#3980)
Samoed Jan 22, 2026
f0b2fe2
fix fleurs citation (#4010)
Samoed Jan 29, 2026
0cb5c05
Fix citations (#4012)
Samoed Jan 29, 2026
a87dcde
MAEB task selection (#3867)
isaac-chung Jan 31, 2026
25cea64
Merge main 3001 (#4019)
Samoed Jan 31, 2026
7fd8642
Merge branch 'main' into maeb
Samoed Jan 31, 2026
d43dcac
move maeb scripts to sep repo (#4028)
isaac-chung Jan 31, 2026
79f9d2b
MAEB reference models (#4026)
isaac-chung Feb 1, 2026
290f7c1
Support datasets v4 (#4004)
Samoed Feb 2, 2026
75609f0
Merge branch 'main' into maeb
Samoed Feb 2, 2026
e8532cb
[maeb] Move datasets to mteb org (#4048)
Samoed Feb 6, 2026
cea6599
[MAEB] Move tasks to tasks folder (#4057)
Samoed Feb 6, 2026
71d45e3
Merge branch 'main' into maeb
Samoed Feb 7, 2026
5fd436e
Make birdset dataset handling more efficient (#3863)
AdnanElAssadi56 Feb 7, 2026
f193eed
skip test audio models
Samoed Feb 7, 2026
c0efa80
[MAEB] Use collator for processing (#4069)
Samoed Feb 10, 2026
6120b00
[MAEB] Add GLOBE v3 dataset (#4091)
diffunity Feb 13, 2026
4d54d5a
Add audio task installation instructions to docs (#4097)
isaac-chung Feb 14, 2026
b1f55e9
Merge remote-tracking branch 'origin/main' into maeb
isaac-chung Feb 15, 2026
165514f
Update uv.lock
isaac-chung Feb 16, 2026
5299695
Merge branch 'main' into maeb
isaac-chung Feb 16, 2026
29136c1
Fix _update_description call sites to match new signature
isaac-chung Feb 16, 2026
7ac00fa
try install runtime libraries that provide libav*.so
isaac-chung Feb 16, 2026
744bd2f
fix pyproject (#4098)
Samoed Feb 16, 2026
864461a
remove vllm duplicate
Samoed Feb 16, 2026
90afbff
Update .github/workflows/test.yml
isaac-chung Feb 16, 2026
91460f8
fix windows ci (#4100)
Samoed Feb 16, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
52 changes: 50 additions & 2 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -61,11 +61,54 @@ jobs:
with:
python-version: ${{ matrix.python-version }}

# required for evaluation the audio subset
- name: Install FFmpeg (Ubuntu)
if: runner.os != 'Windows'
shell: bash
run: |
sudo apt-get update
sudo apt-get install -y software-properties-common
sudo add-apt-repository -y ppa:ubuntuhandbook1/ffmpeg8
sudo apt-get update
sudo apt-get install -y ffmpeg

- name: Setup Miniconda (Windows)
if: runner.os == 'Windows'
uses: conda-incubator/setup-miniconda@v2
with:
auto-update-conda: true
miniconda-version: "latest"
activate-environment: ffmpeg

- name: Install FFmpeg (Windows)
if: runner.os == 'Windows'
shell: pwsh
# using conda to install ffmpeg on windows, to avoid issues with dlls
run: |
conda install -y "ffmpeg=8.0.1" -c conda-forge

- name: Check FFmpeg version
run: ffmpeg -version

- name: Install dependencies
shell: bash
run: |
make install-for-tests

- name: Setup Pytorch dll PATH (Windows)
# otherwise pytorch audio cannot find pytorch dlls on windows
if: runner.os == 'Windows'
shell: pwsh
run: |
$torchLib = "D:\a\mteb\mteb\.venv\Lib\site-packages\torch\lib"
if (Test-Path $torchLib) {
echo "$torchLib" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
$env:PATH = "$torchLib;$env:PATH"
echo "PATH=$env:PATH" >> $env:GITHUB_ENV
} else {
Write-Host "Torch lib path not found: $torchLib"
}

- name: Run tests
if: runner.os != 'Windows'
shell: bash
Expand All @@ -77,9 +120,14 @@ jobs:
# this step will run the workflow twice since we have experienced
# failures when running on windows when loading the datasets
if: runner.os == 'Windows'
shell: bash
shell: pwsh
run: |
# run the test once and if it fails, run it again
# if it fails again, the workflow will fail.
# If it passes the first time the test will not run again
make test || make test
try {
make test
} catch {
Write-Host "First test run failed, retrying..."
make test
}
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -163,3 +163,5 @@ powermetrics_log.txt
/docs/overview/available_benchmarks.md

CLAUDE.md

*.tex
4 changes: 2 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ install:
install-for-tests:
@echo "--- 🚀 Installing project dependencies for test ---"
@echo "This ensures that the project is not installed in editable mode"
uv sync --no-editable --extra bm25s --extra pylate --extra image --extra codecarbon --extra leaderboard --extra faiss-cpu --group dev
uv sync --no-editable --extra bm25s --extra image --extra audio --extra codecarbon --extra leaderboard --extra faiss-cpu --group dev

lint:
@echo "--- 🧹 Running linters ---"
Expand Down Expand Up @@ -77,7 +77,7 @@ leaderboard-test-all:

run-leaderboard:
@echo "--- 🚀 Running leaderboard locally ---"
uv run --extra leaderboard python -m mteb.leaderboard.app
uv run --no-sync --extra leaderboard python -m mteb.leaderboard.app

format-citations:
@echo "--- 🧹 Formatting citations ---"
Expand Down
40 changes: 40 additions & 0 deletions docs/installation.md
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to update the whats_new.md

Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,46 @@ If you want to run certain models implemented within mteb you will often need so

If a specific model requires a dependency it will raise an error with the recommended installation. To see full list of available models you can look at the [models overview](./overview/available_models/text.md).

## Audio Tasks

If you want to run audio tasks, install the audio dependencies:

=== "pip"
```bash
pip install mteb[audio]
```

=== "uv"
```bash
uv add "mteb[audio]"
```

### Additional Requirements for `datasets>=4`

If you are using `datasets>=4`, you will need to:

1. **Install FFmpeg**: The `datasets` library version 4+ uses `torchcodec` for audio processing, which requires FFmpeg to be installed on your system.

=== "macOS"
```bash
brew install ffmpeg
```

=== "Ubuntu/Debian"
```bash
sudo apt-get install ffmpeg
```

=== "Windows"
Download from [ffmpeg.org](https://ffmpeg.org/download.html) and add to your PATH.

2. **Use `transformers>=4.57.6`**: Due to compatibility issues with `datasets>=4`, you need a recent version of transformers:
```bash
pip install "transformers>=4.57.6"
```

If you are using `datasets<4`, no additional requirements are needed beyond the `mteb[audio]` installation.

## Migrating to uv (for Contributors)

If you're a contributor currently using pip, here's how to migrate to uv for faster dependency management:
Expand Down
Loading
Loading