adding GTZAN Genre dataset by silky1708 · Pull Request #2307 · embeddings-benchmark/mteb

silky1708 · 2025-03-10T23:20:17Z

Solves #2241

====================================================

Code Quality

Code Formatted: Format the code using make lint to maintain consistent style.

Documentation

Updated Documentation: Add or update documentation to reflect the changes introduced in this PR.

Testing

New Tests Added: Write tests to cover new functionality. Validate with make test-with-coverage.
Tests Passed: Run tests locally using make test or make test-with-coverage to ensure no existing functionality is broken.

Adding datasets checklist

Reason for dataset addition: ...

I have run the following models on the task (adding the results to the pr). These can be run using the mteb -m {model_name} -t {task_name} command.
- sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
- intfloat/multilingual-e5-small
I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
If the dataset is too big (e.g. >2048 examples), considering using self.stratified_subsampling() under dataset_transform()
I have filled out the metadata object in the dataset file (find documentation on it here).
Run tests locally to make sure nothing is broken using make test.
Run the formatter to format the code using make lint.

Adding a model checklist

I have filled out the ModelMeta object to the extent possible
I have ensured that my model can be loaded using
- mteb.get_model(model_name, revision) and
- mteb.get_model_meta(model_name, revision)
I have tested the implementation works on a representative set of tasks.

isaac-chung

A few comments - This HF dataset is missing one sample. See comments for details.

mteb/tasks/Audio/AudioClassification/eng/GTZANGenre.py

isaac-chung · 2025-03-11T06:49:15Z

mteb/tasks/Audio/AudioClassification/eng/GTZANGenre.py

+        eval_splits=["train"],
+        eval_langs=["eng-Latn"],
+        main_score="accuracy",
+        date=("2023-06-23", "2023-06-23"),


See linked github README in the issue:

The files were collected in 2000-2001 from a variety of sources including personal CDs, radio, microphone recordings, in order to represent a variety of recording conditions.

Let's follow the README and use 2000-2001?

oh, I thought the dates were when dataset is uploaded to hf -- fixed to 2000-2001

mteb/tasks/Audio/AudioClassification/eng/GTZANGenre.py

silky1708 · 2025-03-13T20:53:26Z

make test fails after the recent update to Makefile

isaac-chung

Looks good! All that's remaining is the date and linting. Tests seem to be passing here.

silky1708 · 2025-03-13T22:10:34Z

Don't know why lint test fails, but when I run make lint locally -- it seems fine!

Samoed · 2025-03-13T22:20:52Z

I think lint is failing, because recently main branch was merged to maeb #2341, and you need to update your config

isaac-chung · 2025-03-13T22:24:22Z

Hmm might also want to check your ruff version. I think it should be 0.9.7

* Started the following: - define audio encoder interface - implement abstask and evaluator for clustering * Minor changes and linted files. #2093 * Minor changes and linted files. #2093 * Minor changes and linted files. #2093 * Refs #2068: Initial Implementation of audio-text retrieval abstask and evaluator * Added MockAudioClustering task + MockAudioEncoder for testcase Created test_maeb_datasets.py to test AbsTask and Evaluator for clustering * MockAudioClustering + MockAudioEncoder (#2093) * Added wav2vec model wrapper * Added subTask with small sample of dataset for testing * Added four w2v variants * Update wav2vec_models.py * Added wav2vec (5), wavlm (7), and whisper (5) models * Added revisions from HF to wav2vec models, added silhouette score, DBSCAN and agglomerative algorithms into clustering evaluator, added algorithm selector into VoiceGender * Update mteb/models/wavlm_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * setting up colab * added a2a * PCA + hidden layer + shuffling * New task: emotion clustering * Added qwen2 model * Added Wav2Vec model, voice clustering task, VoxCeleb dataset subset (#2175) * Added wav2vec model wrapper * Added four w2v variants * Update wav2vec_models.py * Removed run.py test script * Added subTask with small sample of dataset for testing * Removed test portion of VoiceGender.py task * add commit hash and bibtex * make lint * update models * fix circular import * make VoiceGender discoverable in get_tasks * add a2a as category for clustering * specify latest commit hash * revert linting changes * Based on feedback for model: updated w2v2 revisions and added torchaudio to .toml file * Added Bibtex for dataset, set data to be test instead of training, shortened task_subtype * Changed task from Voice Gender Clustering to Gender Clustering. * Fixed mock audio clustering tests * Added dataset metadata * Linted * Passed revision into the w2v2 loader * passed lint check * Linted * Update VoiceGender.py --------- Co-authored-by: Ali Sartaz Khan <alisartazkhan@gmail.com> Co-authored-by: Ali Sartaz Khan <71156712+alisartazkhan@users.noreply.github.com> Co-authored-by: mn <mn@Ms-MacBook-Pro.local> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Revert "Maeb - added voice clustering task, wav2vec model and VoxCeleb dataset (subset)" (#2202) * Revert "Revert "Maeb - added voice clustering task, wav2vec model and VoxCeleb dataset (subset)"" (#2203) Revert "Revert "Maeb - added voice clustering task, wav2vec model and VoxCele…" This reverts commit ee10191f1d4f10f705a595062cc04d749f9a2dc3. * Revert "Revert "Revert "Maeb - added voice clustering task, wav2vec model and VoxCeleb dataset (subset)""" (#2207) Revert "Revert "Revert "Maeb - added voice clustering task, wav2vec model and…" This reverts commit f1449c07f8f6c8a084daab572515f3110cd67027. * Add Audio (Multi Label) Classification Abstask, Baseline Audio model, FSD50k Dataset and Task (#2082) * init audio * some encoder related changes * some more abs task defs Co-authored-by: rahulschand <rahulsc@stanford.edu> * evaluators and classification * remove rahul changes to generate first PR * make lint * add dataset/tasks skeleton * readd changes lost in rebase * add fsd50k * add task categories for audio * slight updates to fsd50k * make lint * wav2vec2 model * add fsd50k metadata * rename folder * add metric * add torchaudio in req * reigster wav2vec2 models * fixes * add audio in valid task types * mock interface changes * make lint * rm audio clustering * wav2vec2 model revision update * rm comment * rm test.py * add revisions to all wav2vec2 models * rm empty abstask files * rm empty evaluator files * rm empty task files * Update tests/test_tasks/test_all_abstasks.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Update mteb/models/wav2vec2_models.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * rm non-logReg evaluators for audio classification * lint * fn name changed to convert_audio_from_numpy * rm mock tests for audio kNN classification * rm evaluators for audio kNN classification * fix imports * fix audio kNN; make lint * rm AbsTaskAudioClassification.py for later PR * remove commented code; reset changes to ClassificationEvaluator.py * fix mock tasks for multilabel classification * make lint * inherit Wrapper class * add all languages supported by wav2vec2 * make lint * add script info to all languages * make lint * recent changes * merge wav2vec2 + add updated logic for auto padding for fqd50k type datasets * make lint remove uwanted files * remove debug lines * remove esc50 refs * fix mock tasks for multilabel * fix mock tasks for multilabel * Revert "Merge branch 'maeb' into maeb" bad direct commit made to upstream maeb branch https://github.com/embeddings-benchmark/mteb/commit/4f23fdf27e3263188fe89031d186b293a279153e This reverts commit 4f23fdf27e3263188fe89031d186b293a279153e, reversing changes made to 130247730a07ad1fd2d06a878b7c9bb15af07af7. * fix model imports * fqd50k cleaning * update fsd50k * change dataset * eval subsets correctly * make lint and remove debug statements * clean print statements * make lint * update fsd2019 dataset * remove init in AbsTaskAudioMultilabelClassification.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * add class parameters in AbsTaskAudioMultilabelClassification * inherit from multilingualtask for FSD2019Kaggle * make lint * update mock_tasks; make lint * remove train_split from fn parameters * define fsd2019k to be multilingual * inherit from MultilingualTask in fsd2019K * fix tests * inherit correct multingial task class * remove MockAudioMultilabelClassificationLogRegTask * rm other instances of MockAudioMultilabelClassificationLogRegTask --------- Co-authored-by: rahulschand <rahulsc@stanford.edu> Co-authored-by: Silky Singh <silky1708@gmail.com> Co-authored-by: Silky Singh <54901747+silky1708@users.noreply.github.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Add ESC50 and zero-shot classification (#2133) * init audio * some encoder related changes * some more abs task defs Co-authored-by: rahulschand <rahulsc@stanford.edu> * evaluators and classification * remove rahul changes to generate first PR * make lint * init audio * some encoder related changes * some more abs task defs Co-authored-by: rahulschand <rahulsc@stanford.edu> * evaluators and classification * remove rahul changes to generate first PR * make lint * add dataset/tasks skeleton * readd changes lost in rebase * add fsd50k * add task categories for audio * slight updates to fsd50k * make lint * wav2vec2 model * add fsd50k metadata * rename folder * add metric * add torchaudio in req * reigster wav2vec2 models * fixes * add audio in valid task types * mock interface changes * my 0 shot * make lint * rm audio clustering * wav2vec2 model revision update * rm comment * rm test.py * add revisions to all wav2vec2 models * rm empty abstask files * rm empty evaluator files * rm empty task files * Update tests/test_tasks/test_all_abstasks.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Update mteb/models/wav2vec2_models.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * rm non-logReg evaluators for audio classification * lint * fn name changed to convert_audio_from_numpy * rm mock tests for audio kNN classification * rm evaluators for audio kNN classification * fix imports * fix audio kNN; make lint * rm AbsTaskAudioClassification.py for later PR * added zero-shot loading model and dataset checked * remove commented code; reset changes to ClassificationEvaluator.py * fix mock tasks for multilabel classification * make lint * inherit Wrapper class * add all languages supported by wav2vec2 * make lint * add script info to all languages * make lint * before cleaning comments * ESC and clap model. Tested 81 percent zero-shot numbers * fixed label names for ESC50-multilabel and removed comments * recent changes * merge wav2vec2 + add updated logic for auto padding for fqd50k type datasets * make lint remove uwanted files * remove debug lines * remove esc50 refs * changes for debugging * lint changes and maeb main branch merge * fix mock tasks for multilabel * fix mock tasks for multilabel * Revert "Merge branch 'maeb' into maeb" bad direct commit made to upstream maeb branch https://github.com/embeddings-benchmark/mteb/commit/4f23fdf27e3263188fe89031d186b293a279153e This reverts commit 4f23fdf27e3263188fe89031d186b293a279153e, reversing changes made to 130247730a07ad1fd2d06a878b7c9bb15af07af7. * fix model imports * fqd50k cleaning * fixed error in Image zero shot classfification * update fsd50k * change dataset * eval subsets correctly * make lint and remove debug statements * clean print statements * make lint * update fsd2019 dataset * remove init in AbsTaskAudioMultilabelClassification.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * add class parameters in AbsTaskAudioMultilabelClassification * inherit from multilingualtask for FSD2019Kaggle * make lint * update mock_tasks; make lint * remove train_split from fn parameters * define fsd2019k to be multilingual * inherit from MultilingualTask in fsd2019K * fix tests * inherit correct multingial task class * remove MockAudioMultilabelClassificationLogRegTask * rm other instances of MockAudioMultilabelClassificationLogRegTask * removed unncessary files * removed unncrssary files * removed uncrssary files part 3 * deleted esc50 from multi label classification * fixed errors * fixed lintng, added precision and recall. Removed extra comments * fixed double loading of model * filled in missing meta-data * fixed linting --------- Co-authored-by: Animesh Jha <jha.animesh01@gmail.com> Co-authored-by: rahulschand <rahulsc@stanford.edu> Co-authored-by: Silky Singh <silky1708@gmail.com> Co-authored-by: Silky Singh <54901747+silky1708@users.noreply.github.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Add unfused clap model for zero-shot (#2269) * added unfused model * fixed lint * Add new and complete version of FSD50K multi-label audio classification task (#2285) * Added fsd50k dataset on huggingface * added correct hf version of fsd50k dataset * added correct hf version of fsd50k dataset * removed extra imports * removed unecessary load_data fn * added large, music and speech clap models (#2284) * added large, music and speech clap models * fixed public_training_data and removed training_datasets split * added latest revision * lowercase mit license * fixed issue related to training_datasets * fixed lint * add AbsTaskAudioClassification, ESC50 & GunshotTriangulation datasets (#2287) * add AbsTaskAudioClassification, and ESC50, GunshotTriangulation datasets * rm CrossFold abstask for classification * replace class parameter to is_cross_validation * new function eval_subset_cross_validation in AbsTaskAudioClassification * sync maeb branch with mteb:maeb * Add NSynth dataset (#2306) * add AbsTaskAudioClassification, and ESC50, GunshotTriangulation datasets * rm CrossFold abstask for classification * replace class parameter to is_cross_validation * new function eval_subset_cross_validation in AbsTaskAudioClassification * sync maeb branch with mteb:maeb * add NSynth dataset * update TaskMetadata * import correct Nsynth * add domain music * rm print stmt; update metadata; make pr * update precise dataset size * Add urbansound8k for zero-shot (#2292) * added urbansound8k and tested * lint change * changed dataset type and label * removed left out comments * fixed lint * Add Emotion classification Ravdess dataset (#2320) * added Ravdess dataset * fixed lint * fixed PR comments, added description, changed name * [MAEB] main merge (#2341) * misc: Add image classification descriptive stats implementation (#2045) * add ImageClassificationDescriptiveStatistics * add MNIST descriptive stats * use tuples instead * add label count and update docstrings * update MNIST example * Update tasks table * fix: Add column descriptions to leaderboard (#2039) * fix: Add column descriptions to leaderboard * removed existing overlap * fix: Add BRIGHT (long) and fix bug in TaskResult.filter_and_validate() (#2041) * fix: Add BRIGHT Long Fixes #1978 * fix: Add BRIGHT(long) * fix bug in task results * updated bright * updated tests for TaskResults * 1.34.12 Automatically generated by python-semantic-release * misc: Add image clustering descriptive stats implementation (#2057) * add image clustering descirptive stats and run * finish off last one * remove script * fix: Update embed_dim for jina models (#2058) see https://github.com/embeddings-benchmark/results/pull/117 * Update tasks table * 1.34.13 Automatically generated by python-semantic-release * Add giga embeddings (#1741) * add gigaembeddings * use jasper * fix name * create sentence_transformer instruct wrapper * apply instruction template * fix jasper * update meta * misc: Add ZS and multilabel image classification descriptive stats implementation (#2059) * add image clustering descirptive stats and run * finish off last one * remove script * add ImageMultilabelClassificationDescriptiveStatistics * add VOC2007 * add zeroshot and mnist example * Update tasks table * Rename MIEB task classes with duplicated names (#2061) fix class names * misc: Add VisualSTS descriptive stats (#2062) * add visualsts stats * add last dataset * Update tasks table * fix: Added gte models (#1539) * fix: Added gte models * fix: Add mixbai models (#1540) for #1515 * fix: Add climate fever v2 (#1873) * Updated ClimateFEVER dataset with new version * Adds Fill in the empty metadata. * Updates the date tuple * Update class name Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update domains Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update task_subtypes * Update annotations_creators for the first version * Update date Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update task subtypes * Update path * Update description --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Mina Parham <minaparham@Keatext.local> * Update tasks table * fix: Updating paper scripts (#1958) * change reference revisions to align with paper * Update author list * Added code for main results table * updated minor changes * added external as a "no_revision_available" case * revert unintended changes * format * 1.34.14 Automatically generated by python-semantic-release * Add datasets for a benchmark newly introduced for "Engineering" domain (#1911) * adding clustering tasks (built-bench-clustering S2S & P2P) * updated built-bench-clustering tasks * Updated BuiltBenchClustering tasks * Added "Engineering" as new domain to TaskMetadata.py * Updated tasks table in docs * Updated task metadata for BuiltBenchClustering S2S and P2P * updated metadata for clustering tasks * Add/update BuiltBench tasks - Add BuiltBenchRetrieval task - Add BuiltBenchReranking task - Update metadata for BuiltBenchClusterinP2P - Update metadata for BuiltBenchClusterinS2S * update BuiltBench benchmark * Update mteb/benchmarks/benchmarks.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Clustering/eng/BuiltBenchClusteringS2S.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Clustering/eng/BuiltBenchClusteringP2P.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/benchmarks/benchmarks.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Fix formatting via ruff --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Update tasks table * misc: update model names to adjust for adding to results repo (#2074) * update model names to adjust for adding to results repo * update model meta script * misc: Add all image classification descriptive stats (#2073) * add most image classification descr stats * revert changes to encoder * add stats --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * Update tasks table * ci: Rerun tests that fail due to networking issues. (#2029) * fix: rerun tests that fail - Networking * update tests to use tmp_path * set versions for dev dependencies * add pytest options to pyproject.toml * add rerun json.decoder.JSONDecodeError * remove JSONDecodeError from pyproject.toml * add huggingface_hub.errors.HfHubHTTPError * add huggingface_hub.errors.LocalEntryNotFoundError https://github.com/embeddings-benchmark/mteb/actions/runs/13298535701/job/37139767443?pr=2044 * FileNotFoundError https://github.com/embeddings-benchmark/mteb/actions/runs/13302915091/job/37147507251?pr=2029 * add doc to pytest rerun --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> * fix: generate metadata (#2063) * fix: generate metadata * use logging not print for script * lint * add iso639 to dev pyproject * fix import * add memory_usage_mb * set version for iso639 Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * 1.34.15 Automatically generated by python-semantic-release * fix: add missing `e5` training datasets (#2065) add missing training datasets * 1.34.16 Automatically generated by python-semantic-release * fix: Ensure voyage model uses different naming scheme (#2083) * fix: Added make command for running leaderboard locally * fix: Ensure voyage models doesn't re-use the name * 1.34.17 Automatically generated by python-semantic-release * fix: Freeze model/rank columns in leaderboard (#2044) * fix: freeze model/rank columns in leaderboard * freezing zero-shot column * update min gradio version to 5.16.0 in pyproject.toml --------- Co-authored-by: Shikhar Shiromani <sshiromani@sshiromani-mlt.client.nvidia.com> * 1.34.18 Automatically generated by python-semantic-release * fix: Fixed previous incorrect specification of splits for CMTEB ( MTEB(cmn, v1) ) (#2086) Fixes #2064 * 1.34.19 Automatically generated by python-semantic-release * Remove duplicated string in docstring of TaskMetadata class (#2087) * Remove duplicated string in docstring of TaskMetadata class * Remove duplicated dataset field * fix: Smarter leaderboard caching with cachetools (#2085) * Added smarter caching to callbacks * Added cachetools as a dependency * Ran linting * Removed debugging print statement * Bumped Gradio version * Dependency fixes * Dependency fixes --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * fix: Missing fixes for #2086 - change MultilingualSentiment split from test to validation in CMTEB (#2088) * fix: Fixed previous incorrect specification of splits for CMTEB ( MTEB(cmn, v1) ) Fixes #2064 * change MultilingualSentiment split from test to validation in CMTEB * 1.34.20 Automatically generated by python-semantic-release * merge gme models (#2089) * fix: Add back task filtering by modalities (#2080) * add back task filtering by modalities * add unit test * check if task modalities is a subset of model modalities and fix tests * add model_modalities_more_than_task_modalities case * 1.34.21 Automatically generated by python-semantic-release * Added gtr-t5-base/large/xl/xxl metadata to mteb (#2092) * Added GTR Models to codebase * Linted gtr models file. * Added gtr-base/large/xl/xxl to sentence_transformers_models.py * Added memory_usage_mb and training_datasets * Reformatted training dataset names * Reformatted training dataset names * Reformatted training dataset names --------- Co-authored-by: sufen <sufenf@gmail.com> * misc: Add Any2TextMutipleChoice Descriptive Statistics (#2095) * add Any2TextMutipleChoiceDescriptiveStatistics * run on all tasks * Update tasks table * fix: Updated model annotations for GTE, e5, gritlm, and SFR models (#2101) Reported with references to paper + qoutes. * fix: Update links (#2098) * Fix link * Fix link * 1.34.22 Automatically generated by python-semantic-release * Add model inf-retriever-v1-1.5b (#2106) Add inf-retriever-v1-1.5b model * docs: Fix typos & refine text (#2102) * Update app.py * Fix typos * misc: Run Zeroshot Classification Descriptive Stats (#2105) * add most datasets * add birdsnap and imgnet1k * add scimmir and sun397 * add uck101 zs * Update tasks table * fix: add warning about task category conversion (#2108) add warning about task category conversion * 1.34.23 Automatically generated by python-semantic-release * fix: Add codesage-large-v2 (#2090) * Add codesage-large-v2 * Address comments * Add training dataset * Fix issues * Format code * Remove unnecessary wrapper * 1.34.24 Automatically generated by python-semantic-release * fix: add training data to BGE-m3-custom-fr (#2110) This ensure that is it correctly filtered as non-zero-shot * 1.34.25 Automatically generated by python-semantic-release * fix: Upgrade ruff to be gradio compatible (#2111) * fix: update ruff to be gradio compatible (>=0.9.3) * format * fix: upgrade ruff to latests (same as gradio compatible) * 1.34.26 Automatically generated by python-semantic-release * docs: Follow google docstring format (#2115) Fixes #2113 * Update leaderboard_refresh.yaml (#2121) * fix InstructSentenceTransformer Model name (#2125) fix params * fix voyage (#2127) * fix: update e5 instruct training data (#2129) update e5 training data * 1.34.27 Automatically generated by python-semantic-release * format * Update tasks table * fix: Add 2 new Static Sentence Transformer models (#2112) * Add 2 new Static Sentence Transformer models * Add Tatoeba Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * 1.34.28 Automatically generated by python-semantic-release * add is_cross_encoder (#1869) * add is_cross_encoder * Update mteb/model_meta.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * change value --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Qodo embed 1 1.5 b (#2137) * feat: Add Qodo-Embed-1-1.5B model metadata * fix: Add Qodo models to overview imports * fix: Add adapted_from field to Qodo model metadata * Update mteb/models/qodo_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * relint --------- Co-authored-by: Tal Sheffer <tal.s@codium.ai> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * misc: merge summary retrieval into bitext mining (#2140) merge summary retrieval into bitext mining * test: fix dataset availability test (#2141) This simplified the test and also make it a lot simpler. It also removed about 100 test cases which where all to the same API call. * fix: Update NVIDIA-Embed training data (#2143) Added a few missing annotations for nvidia-embed * 1.34.29 Automatically generated by python-semantic-release * fix: Add annotations for Voyage exp (#2144) * fix: Update NVIDIA-Embed training data Added a few missing annotations for nvidia-embed * fix update annotationf for voyage exp * 1.34.30 Automatically generated by python-semantic-release * Fix tokens num in cde models (#2148) fix tokens * feat: Add Qodo-Embed-1-7B model metadata and rename existing model (#2146) * feat: Add Qodo-Embed-1-7B model metadata and rename existing model * lint * fix revision * update license name --------- Co-authored-by: Tal Sheffer <tal.s@codium.ai> * 1.35.0 Automatically generated by python-semantic-release * misc: add Any2AnyRetrievalDescriptiveStatistics (#2139) add Any2AnyRetrievalDescriptiveStatistics * Update tasks table * Added zero-shot percentages and different filtering scheme (#2153) * Added zero-shot percentages and different filtering scheme * Update mteb/model_meta.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix: Incorrect annotations for Mistral-based embedding models (#2157) Fixes #2155 * 1.35.1 Automatically generated by python-semantic-release * Update FaMTEBRetrieval.py (#2171) The URL pointed to the settings page instead of the main repo URL. Now it is fixed. * Update tasks table * fix: Add Training data annotations (#2173) * redo to voyage to only training data * Add training data annotation for Kalm embeddings #2168 * Add correct training data annotations to Stella #2164 * removed fiqa PL as it does not exist * remove ArxivClusteringS2S.v2 as it does not exist * Add training data annotation for GIST embedding #2166 * fix max tokens for kalm models #2162 * remove eli 5 * 1.35.2 Automatically generated by python-semantic-release * feat: Add MIEB and MIEB-lite as benchmarks (#2035) * add mieb and mieb-lite to benchmarks * add CompositionalityEvaluation and DocumentUnderstanding types * add VisionCentric type * add missing comma * split STS17MultilingualVisualSTS and STSBenchmarkMultilingualSTS to eng and non-eng * use aggregate task instead so we can name the subsets * shorten names * fix import * alternative strategy to avoid using get_task * follow other aggregate tasks and skip metadata test * run LB without errors when selecting MIEB(-lite) * add back the capability as taks type * typo * extend description * split into mieb(eng) and mieb(multilingual) * remove unneeded files * remove aggtask additions for test * edit descriptions based on screenshots * shorten * rename to Compositionality and include ImageCoDeT2IMultiChoice * re-tag missing VisionCentric tasks * re-tag rparis and roxford as retrieval and include fixes * re-tag voc2007 as image cls * make lint * correct num task types in descriptions * add one model to models_to_annotate * add mieb reference models * update task types * relabel to multilingual retrieval task type to align with paper * fix reference and bibtex * edit task list to match with final list * add back agg task to reproduce table column in paper * fix filtering and import * update tests * mieb lite add back missing tasks * fix metadata test * multi should have all 4 variants * fix task counts * lite has 10 task types * fix visualSTS-17 lang splits * Aggregate task can now use subsets & eval langs to filter TaskResults * fix test and mark VisualSTS17 as multilingual * fix tests * add agg task running script * add voyage meta * fix citations * capitalize * add coarse/fine labels --------- Co-authored-by: gowitheflow-1998 <jsbs54@durham.ac.uk> * Update tasks table * 1.36.0 Automatically generated by python-semantic-release * fix: update training datasets and revision for jina models (#2179) * feat: update training datasets and revision for jina models * feat: update training datasets and revision for jina models * fix: Add more training data annotations (#2178) * redo to voyage to only training data * Add training data annotation for Kalm embeddings #2168 * Add correct training data annotations to Stella #2164 * removed fiqa PL as it does not exist * remove ArxivClusteringS2S.v2 as it does not exist * Add training data annotation for GIST embedding #2166 * fix max tokens for kalm models #2162 * remove eli 5 * fix: add training data for Bilingual Embeddings fixes #2167 * 1.36.1 Automatically generated by python-semantic-release * Added training data annotation for e5-base-4k (#2186) * fix: Added training data annotations to MXBAI (#2185) * fix: Update MTEB(Scandinavian) to use new DanFEVER (#2180) This also resolves the missing data in the leaderboard. Fixes #2172 * fix: Added training data annotation for MMLW models (#2188) * Added training data annotation for MMLW models * Added GIST annotations Kenneth missed * Added Stella en 400m training data' * 1.36.2 Automatically generated by python-semantic-release * fix: Added training data for sentence-croissant (#2189) * 1.36.3 Automatically generated by python-semantic-release * fix: update ru models annotation (#2181) * 1.36.4 Automatically generated by python-semantic-release * fix: Alphabetical ordering of tasks in dropdowns (#2191) * 1.36.5 Automatically generated by python-semantic-release * misc: Speed up qrel creation in any2anyretrieval (#2196) * use numpy vectorized operations instead of row-by-row * scores are int * use 'mteb.MTEB' instead of 'MTEB' for custom model (#2199) * add base models for e5 (#2183) * add similar datasets (#2205) * add similar datasets * add nano * update is filled * Update mteb/abstasks/TaskMetadata.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * add labse annotation (#2182) * add labse annotation * Update mteb/models/sentence_transformers_models.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * fix: Fixed leaderboard crash (#2221) * Fixed leaderboard crash * Fixed language selection error * Ran linting * 1.36.6 Automatically generated by python-semantic-release * fix: More training data annotations (#2220) * Added training data annotation for bge-gemma * Added missing annotations for Voyage models * Added training data for sts-multilingual-mpnet * Added all mteb datasets to STS-multilingual training data * 1.36.7 Automatically generated by python-semantic-release * Add LLM2CLIP (OpenAI variants) (#2222) * model loading and get_text_embeddings * add image_emb, fused_emb, and calc probs methods * add b16 model * add llm2clip_openai_l_14_224 (not working yet) * got llm2clip_openai_l_14_224 working * make lint * add training sets and allow py files * Change `dataset on HF` test to use official api (#2213) * refactor dataset checking * increase timeout * increase timeout * remove timeout * Descriptive stats functions for Any2AnyMC and ImageTextPC (#2197) * Add Any2AnyMC descriptive stats * Add descriptive stats function for ImageTextPC * add descriptive stats examples * linter * update multi choice descriptive stats * Update tasks table * fix: Add training data annotations to uderver-bloom models (#2210) * fix: Add training data annotations to uderver-bloom models fixes #2193 * fix: add mixedbread --------- Co-authored-by: Márton Kardos <power.up1163@gmail.com> * 1.36.8 Automatically generated by python-semantic-release * Add comment to `voyage-3-m-exp` model (#2229) * remove model size from voyage-3-m-exp model * Update mteb/models/voyage_models.py * Update mteb/models/voyage_models.py * docs: Update description of EURLex (#2231) * Automatically add similar tasks to training_tasks (#2228) * refactor dataset checking * increase timeout * increase timeout * remove timeout * start * automatically find datasets * update comment * fix aggregate task metadata * fixes * lint * rename * update fetch check * Remove overlapping legends from radar chart (#2195) * Remove overlapping legends from radar chart * ensure graph is not blocked * Overlapping legend issue of Radar Chart * misc: Run Any2AnyRetrieval descriptive stats (#2223) * run a few datasets * add a few more * run more tasks * add more datasets * remove pdb * remove newline * add more datasets * Update tasks table * misc: Add rest of the vision centric and compositionality descriptive stats (#2267) add the rest * Update tasks table * Fix `calculate_memory_usage_mb` in adding_a_model.md (#2271) * Add Arabic-Triplet-Matryoshka-V2 model metadata to MTEB (#2270) * Add Arabic-Triplet-Matryoshka-V2 model metadata to MTEB * Update memory_usage_mb with correct calculated value * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * remove comments * added correct memory usage * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Apply linter fixes with ruff * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Add Arabic_Triplet_Matryoshka_V2 to overview.py * Rename model file to ara_models.py and update imports --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix: Add WebFAQ Retrieval dataset (#2236) * Add WebFAQ Retrieval dataset Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Small change WebFAQRetrieval.py Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Add remaining languages to WebFAQ Retrieval task Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Add descriptive stats Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> --------- Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Update tasks table * 1.36.9 Automatically generated by python-semantic-release * fix: Formatting issue in Performance Plot (#2237) * Formatting issue in Performance Plot * make lint * added function for better code readability * 1.36.10 Automatically generated by python-semantic-release * ci: run test_dataset_on_hf separately (#2201) * dont run test_dataset_on_hf in every pr * lint * Update call pytest test_datasets Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update tests/test_tasks/test_all_abstasks.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * not datasets for test * run dataset loading test for push or pull_request * apply feedback --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * add gemini-embedding-exp-03-07 (#2279) * add gemini-embedding-exp-03-07 * remove space for lint * lint fix * update link (#2281) * fix: Run remaining MIEB desc stats (#2288) * run Vidore * GLDv2 * run the rest --------- Co-authored-by: Isaac Chung <isaac@hn496lf4f9.lan> * Update tasks table * 1.36.11 Automatically generated by python-semantic-release * fix: Added Filter Modality (#2262) * Added Filter Modality * resolve suggestions * make lint * make sure test pass * make lint * added exclusive_modality_filter and unit tests * Integrate tests on overview.py * Update tests/test_overview.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * added task related to image modality * Update mteb/abstasks/AbsTask.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update mteb/abstasks/AbsTask.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * update overview..py * make lint * update documentation --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * 1.36.12 Automatically generated by python-semantic-release * fix: Add `ModelMeta` license & custom validations (#2293) * license validation * move licenses * update imports --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * 1.36.13 Automatically generated by python-semantic-release * ci: Add pre-commit hook (#2194) * make dev life nicer - pre-commit hooks * add pre-commit to install * update precommit * update ruff pre-commit * lint * lint --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> * Update tasks table * fix: bug in voyage implementation (#2304) * fix: Fix bug in voyage implementation "passage" is not a valid input for the voyage API. Remapped to "document". * Update mteb/models/voyage_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * 1.36.14 Automatically generated by python-semantic-release * fix: Update voyage name to include Org. (#2322) * 1.36.15 Automatically generated by python-semantic-release * Added VDR Model (#2290) * Added VDR Model * change custom wrapper to SentenceTransformer Wrapper * remove kwargs and add TODO for Image Modality * Update mteb/models/vdr_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix: Resolve conflicting dependencies (#2323) These errors where discovered when trying to install the package using `uv`. We have a problem with salesforce-lavis, which is not compatible with the current set of dependencies. * 1.36.16 Automatically generated by python-semantic-release * fix: remove SyntaxWarnings in py312 (#2325) * fix: Resolve conflicting dependencies These errors where discovered when trying to install the package using `uv`. We have a problem with salesforce-lavis, which is not compatible with the current set of dependencies. * fix: Remove syntax warnings occuring in python 3.12 ``` Python 3.12.0 (main, Oct 2 2023, 20:56:14) [Clang 16.0.3 ] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import mteb # no syntax warnings >>> ``` * 1.36.17 Automatically generated by python-semantic-release * fix: add annotation models for stella zh (#2277) * fix: add annotation models for stella zh Additionally fixed a few annotation errors * format * Update mteb/models/stella_models.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * 1.36.18 Automatically generated by python-semantic-release * fix: Add ModelMeta rubert-mini-frida, BERTA (#2330) * Add rubert-mini-frida model meta * Add BERTA model meta * docs: fix typos * 1.36.19 Automatically generated by python-semantic-release * fix: Add WebFAQ bitext mining tasks (#2326) * Add WebFAQ bitext mining tasks Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Lower number of language pairs in WebFAQBitextMining Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> --------- Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Update tasks table * 1.36.20 Automatically generated by python-semantic-release * make lint * fix validation for license * fix remaining validation errors --------- Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: github-actions <github-actions@github.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Mina Parham <36207068+mina-parham@users.noreply.github.com> Co-authored-by: Mina Parham <minaparham@Keatext.local> Co-authored-by: Mehrzad Shahin-Moghadam <42153677+mehrzadshm@users.noreply.github.com> Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> Co-authored-by: Sam <40773225+sam-hey@users.noreply.github.com> Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> Co-authored-by: Shikhar Shiromani <rbk.shikhar@gmail.com> Co-authored-by: Shikhar Shiromani <sshiromani@sshiromani-mlt.client.nvidia.com> Co-authored-by: Ruslan Bel'kov <ruslan.belckov@yandex.ru> Co-authored-by: Márton Kardos <power.up1163@gmail.com> Co-authored-by: sufen-f <sufenfong@gmail.com> Co-authored-by: sufen <sufenf@gmail.com> Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> Co-authored-by: Samuel Yang <samuelyang150@gmail.com> Co-authored-by: Aradhye Agarwal <aradhyeagarwal@gmail.com> Co-authored-by: Tom Aarsen <37621491+tomaarsen@users.noreply.github.com> Co-authored-by: talshef <tsheffer@gmail.com> Co-authored-by: Tal Sheffer <tal.s@codium.ai> Co-authored-by: garciasces <garciasces@madrid.es> Co-authored-by: gowitheflow-1998 <jsbs54@durham.ac.uk> Co-authored-by: Wang Bo <bo.wang@jina.ai> Co-authored-by: Munot Ayush Sunil <munotayush6@kgpian.iitkgp.ac.in> Co-authored-by: Yaya Sy <58347382+yaya-sy@users.noreply.github.com> Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com> Co-authored-by: Eng. Omar Najar <79968243+omarnj-lab@users.noreply.github.com> Co-authored-by: Michael Dinzinger <39766249+michaeldinzinger@users.noreply.github.com> Co-authored-by: Jinhyuk Lee <lee.jnhk@gmail.com> Co-authored-by: Isaac Chung <isaac@hn496lf4f9.lan> Co-authored-by: sergeyz-zh <49659999+sergeyz-zh@users.noreply.github.com> * adding GTZAN Genre dataset (#2307) * add GTZANGenre dataset * reupload gtzan dataset * update TaskMetadata from mteb:maeb * make pr * add task subtype * update date for gtzan dataset * update ruff to 0.9.7; make lint * Adding Beijing Opera dataset (#2356) * update TaskMetadata from mteb:maeb * make pr * add task subtype * update ruff to 0.9.7; make lint * add voxlingua107-top10 dataset * add beijing opera dataset * update TaskMetadata from mteb:maeb * make pr * update ruff to 0.9.7; make lint * update TaskMetadata from mteb:maeb * update TaskMetadata * add Mridingham datasets * rm comment * Adding Libricount dataset (#2361) * update taskmetadata * sync mteb:maeb * make lint * Adding Crema-D Dataset for emotion classification [HEAR] (#2368) * update TaskMetadata from mteb:maeb * make pr * add task subtype * update ruff to 0.9.7; make lint * add voxlingua107-top10 dataset * update TaskMetadata * adding CREMA-D dataset * rm deleted files * Adding FSDD dataset (Free Spoken Digit Dataset) (#2371) update Taskmetadata * Add VoxCelebSA, SpokenQAforIC, VehicleSoundClustering from Dynamic-SUPERB (#2379) * add datasets from dynamic-superb * make lint * remove label mapping * apply lint * change eval_split train -> test * fix FSD-50K Task Metadata, Label handling and add stratified subsampling (#2369) fix FSD-50K task * Add music clustering dataset (#2232) * Adds music-genre dataset * Updates revision * Fixes issues * Changes category * Removes a2t from task category * Update the revision * Fixes based on the feedback about converting rate * Remove librosa Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * [MAEB] merge main -> maeb (#2471) * misc: Add image classification descriptive stats implementation (#2045) * add ImageClassificationDescriptiveStatistics * add MNIST descriptive stats * use tuples instead * add label count and update docstrings * update MNIST example * Update tasks table * fix: Add column descriptions to leaderboard (#2039) * fix: Add column descriptions to leaderboard * removed existing overlap * fix: Add BRIGHT (long) and fix bug in TaskResult.filter_and_validate() (#2041) * fix: Add BRIGHT Long Fixes #1978 * fix: Add BRIGHT(long) * fix bug in task results * updated bright * updated tests for TaskResults * 1.34.12 Automatically generated by python-semantic-release * misc: Add image clustering descriptive stats implementation (#2057) * add image clustering descirptive stats and run * finish off last one * remove script * fix: Update embed_dim for jina models (#2058) see https://github.com/embeddings-benchmark/results/pull/117 * Update tasks table * 1.34.13 Automatically generated by python-semantic-release * Add giga embeddings (#1741) * add gigaembeddings * use jasper * fix name * create sentence_transformer instruct wrapper * apply instruction template * fix jasper * update meta * misc: Add ZS and multilabel image classification descriptive stats implementation (#2059) * add image clustering descirptive stats and run * finish off last one * remove script * add ImageMultilabelClassificationDescriptiveStatistics * add VOC2007 * add zeroshot and mnist example * Update tasks table * Rename MIEB task classes with duplicated names (#2061) fix class names * misc: Add VisualSTS descriptive stats (#2062) * add visualsts stats * add last dataset * Update tasks table * fix: Added gte models (#1539) * fix: Added gte models * fix: Add mixbai models (#1540) for #1515 * fix: Add climate fever v2 (#1873) * Updated ClimateFEVER dataset with new version * Adds Fill in the empty metadata. * Updates the date tuple * Update class name Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update domains Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update task_subtypes * Update annotations_creators for the first version * Update date Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update task subtypes * Update path * Update description --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Mina Parham <minaparham@Keatext.local> * Update tasks table * fix: Updating paper scripts (#1958) * change reference revisions to align with paper * Update author list * Added code for main results table * updated minor changes * added external as a "no_revision_available" case * revert unintended changes * format * 1.34.14 Automatically generated by python-semantic-release * Add datasets for a benchmark newly introduced for "Engineering" domain (#1911) * adding clustering tasks (built-bench-clustering S2S & P2P) * updated built-bench-clustering tasks * Updated BuiltBenchClustering tasks * Added "Engineering" as new domain to TaskMetadata.py * Updated tasks table in docs * Updated task metadata for BuiltBenchClustering S2S and P2P * updated metadata for clustering tasks * Add/update BuiltBench tasks - Add BuiltBenchRetrieval task - Add BuiltBenchReranking task - Update metadata for BuiltBenchClusterinP2P - Update metadata for BuiltBenchClusterinS2S * update BuiltBench benchmark * Update mteb/benchmarks/benchmarks.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Clustering/eng/BuiltBenchClusteringS2S.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Clustering/eng/BuiltBenchClusteringP2P.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/benchmarks/benchmarks.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Fix formatting via ruff --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Update tasks table * misc: update model names to adjust for adding to results repo (#2074) * update model names to adjust for adding to results repo * update model meta script * misc: Add all image classification descriptive stats (#2073) * add most image classification descr stats * revert changes to encoder * add stats --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * Update tasks table * ci: Rerun tests that fail due to networking issues. (#2029) * fix: rerun tests that fail - Networking * update tests to use tmp_path * set versions for dev dependencies * add pytest options to pyproject.toml * add rerun json.decoder.JSONDecodeError * remove JSONDecodeError from pyproject.toml * add huggingface_hub.errors.HfHubHTTPError * add huggingface_hub.errors.LocalEntryNotFoundError https://github.com/embeddings-benchmark/mteb/actions/runs/13298535701/job/37139767443?pr=2044 * FileNotFoundError https://github.com/embeddings-benchmark/mteb/actions/runs/13302915091/job/37147507251?pr=2029 * add doc to pytest rerun --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> * fix: generate metadata (#2063) * fix: generate metadata * use logging not print for script * lint * add iso639 to dev pyproject * fix import * add memory_usage_mb * set version for iso639 Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * 1.34.15 Automatically generated by python-semantic-release * fix: add missing `e5` training datasets (#2065) add missing training datasets * 1.34.16 Automatically generated by python-semantic-release * fix: Ensure voyage model uses different naming scheme (#2083) * fix: Added make command for running leaderboard locally * fix: Ensure voyage models doesn't re-use the name * 1.34.17 Automatically generated by python-semantic-release * fix: Freeze model/rank columns in leaderboard (#2044) * fix: freeze model/rank columns in leaderboard * freezing zero-shot column * update min gradio version to 5.16.0 in pyproject.toml --------- Co-authored-by: Shikhar Shiromani <sshiromani@sshiromani-mlt.client.nvidia.com> * 1.34.18 Automatically generated by python-semantic-release * fix: Fixed previous incorrect specification of splits for CMTEB ( MTEB(cmn, v1) ) (#2086) Fixes #2064 * 1.34.19 Automatically generated by python-semantic-release * Remove duplicated string in docstring of TaskMetadata class (#2087) * Remove duplicated string in docstring of TaskMetadata class * Remove duplicated dataset field * fix: Smarter leaderboard caching with cachetools (#2085) * Added smarter caching to callbacks * Added cachetools as a dependency * Ran linting * Removed debugging print statement * Bumped Gradio version * Dependency fixes * Dependency fixes --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * fix: Missing fixes for #2086 - change MultilingualSentiment split from test to validation in CMTEB (#2088) * fix: Fixed previous incorrect specification of splits for CMTEB ( MTEB(cmn, v1) ) Fixes #2064 * change MultilingualSentiment split from test to validation in CMTEB * 1.34.20 Automatically generated by python-semantic-release * merge gme models (#2089) * fix: Add back task filtering by modalities (#2080) * add back task filtering by modalities * add unit test * check if task modalities is a subset of model modalities and fix tests * add model_modalities_more_than_task_modalities case * 1.34.21 Automatically generated by python-semantic-release * Added gtr-t5-base/large/xl/xxl metadata to mteb (#2092) * Added GTR Models to codebase * Linted gtr models file. * Added gtr-base/large/xl/xxl to sentence_transformers_models.py * Added memory_usage_mb and training_datasets * Reformatted training dataset names * Reformatted training dataset names * Reformatted training dataset names --------- Co-authored-by: sufen <sufenf@gmail.com> * misc: Add Any2TextMutipleChoice Descriptive Statistics (#2095) * add Any2TextMutipleChoiceDescriptiveStatistics * run on all tasks * Update tasks table * fix: Updated model annotations for GTE, e5, gritlm, and SFR models (#2101) Reported with references to paper + qoutes. * fix: Update links (#2098) * Fix link * Fix link * 1.34.22 Automatically generated by python-semantic-release * Add model inf-retriever-v1-1.5b (#2106) Add inf-retriever-v1-1.5b model * docs: Fix typos & refine text (#2102) * Update app.py * Fix typos * misc: Run Zeroshot Classification Descriptive Stats (#2105) * add most datasets * add birdsnap and imgnet1k * add scimmir and sun397 * add uck101 zs * Update tasks table * fix: add warning about task category conversion (#2108) add warning about task category conversion * 1.34.23 Automatically generated by python-semantic-release * fix: Add codesage-large-v2 (#2090) * Add codesage-large-v2 * Address comments * Add training dataset * Fix issues * Format code * Remove unnecessary wrapper * 1.34.24 Automatically generated by python-semantic-release * fix: add training data to BGE-m3-custom-fr (#2110) This ensure that is it correctly filtered as non-zero-shot * 1.34.25 Automatically generated by python-semantic-release * fix: Upgrade ruff to be gradio compatible (#2111) * fix: update ruff to be gradio compatible (>=0.9.3) * format * fix: upgrade ruff to latests (same as gradio compatible) * 1.34.26 Automatically generated by python-semantic-release * docs: Follow google docstring format (#2115) Fixes #2113 * Update leaderboard_refresh.yaml (#2121) * fix InstructSentenceTransformer Model name (#2125) fix params * fix voyage (#2127) * fix: update e5 instruct training data (#2129) update e5 training data * 1.34.27 Automatically generated by python-semantic-release * format * Update tasks table * fix: Add 2 new Static Sentence Transformer models (#2112) * Add 2 new Static Sentence Transformer models * Add Tatoeba Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * 1.34.28 Automatically generated by python-semantic-release * add is_cross_encoder (#1869) * add is_cross_encoder * Update mteb/model_meta.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * change value --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Qodo embed 1 1.5 b (#2137) * feat: Add Qodo-Embed-1-1.5B model metadata * fix: Add Qodo models to overview imports * fix: Add adapted_from field to Qodo model metadata * Update mteb/models/qodo_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * relint --------- Co-authored-by: Tal Sheffer <tal.s@codium.ai> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * misc: merge summary retrieval into bitext mining (#2140) merge summary retrieval into bitext mining * test: fix dataset availability test (#2141) This simplified the test and also make it a lot simpler. It also removed about 100 test cases which where all to the same API call. * fix: Update NVIDIA-Embed training data (#2143) Added a few missing annotations for nvidia-embed * 1.34.29 Automatically generated by python-semantic-release * fix: Add annotations for Voyage exp (#2144) * fix: Update NVIDIA-Embed training data Added a few missing annotations for nvidia-embed * fix update annotationf for voyage exp * 1.34.30 Automatically generated by python-semantic-release * Fix tokens num in cde models (#2148) fix tokens * feat: Add Qodo-Embed-1-7B model metadata and rename existing model (#2146) * feat: Add Qodo-Embed-1-7B model metadata and rename existing model * lint * fix revision * update license name --------- Co-authored-by: Tal Sheffer <tal.s@codium.ai> * 1.35.0 Automatically generated by python-semantic-release * misc: add Any2AnyRetrievalDescriptiveStatistics (#2139) add Any2AnyRetrievalDescriptiveStatistics * Update tasks table * Added zero-shot percentages and different filtering scheme (#2153) * Added zero-shot percentages and different filtering scheme * Update mteb/model_meta.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix: Incorrect annotations for Mistral-based embedding models (#2157) Fixes #2155 * 1.35.1 Automatically generated by python-semantic-release * Update FaMTEBRetrieval.py (#2171) The URL pointed to the settings page instead of the main repo URL. Now it is fixed. * Update tasks table * fix: Add Training data annotations (#2173) * redo to voyage to only training data * Add training data annotation for Kalm embeddings #2168 * Add correct training data annotations to Stella #2164 * removed fiqa PL as it does not exist * remove ArxivClusteringS2S.v2 as it does not exist * Add training data annotation for GIST embedding #2166 * fix max tokens for kalm models #2162 * remove eli 5 * 1.35.2 Automatically generated by python-semantic-release * feat: Add MIEB and MIEB-lite as benchmarks (#2035) * add mieb and mieb-lite to benchmarks * add CompositionalityEvaluation and DocumentUnderstanding types * add VisionCentric type * add missing comma * split STS17MultilingualVisualSTS and STSBenchmarkMultilingualSTS to eng and non-eng * use aggregate task instead so we can name the subsets * shorten names * fix import * alternative strategy to avoid using get_task * follow other aggregate tasks and skip metadata test * run LB without errors when selecting MIEB(-lite) * add back the capability as taks type * typo * extend description * split into mieb(eng) and mieb(multilingual) * remove unneeded files * remove aggtask additions for test * edit descriptions based on screenshots * shorten * rename to Compositionality and include ImageCoDeT2IMultiChoice * re-tag missing VisionCentric tasks * re-tag rparis and roxford as retrieval and include fixes * re-tag voc2007 as image cls * make lint * correct num task types in descriptions * add one model to models_to_annotate * add mieb reference models * update task types * relabel to multilingual retrieval task type to align with paper * fix reference and bibtex * edit task list to match with final list * add back agg task to reproduce table column in paper * fix filtering and import * update tests * mieb lite add back missing tasks * fix metadata test * multi should have all 4 variants * fix task counts * lite has 10 task types * fix visualSTS-17 lang splits * Aggregate task can now use subsets & eval langs to filter TaskResults * fix test and mark VisualSTS17 as multilingual * fix tests * add agg task running script * add voyage meta * fix citations * capitalize * add coarse/fine labels --------- Co-authored-by: gowitheflow-1998 <jsbs54@durham.ac.uk> * Update tasks table * 1.36.0 Automatically generated by python-semantic-release * fix: update training datasets and revision for jina models (#2179) * feat: update training datasets and revision for jina models * feat: update training datasets and revision for jina models * fix: Add more training data annotations (#2178) * redo to voyage to only training data * Add training data annotation for Kalm embeddings #2168 * Add correct training data annotations to Stella #2164 * removed fiqa PL as it does not exist * remove ArxivClusteringS2S.v2 as it does not exist * Add training data annotation for GIST embedding #2166 * fix max tokens for kalm models #2162 * remove eli 5 * fix: add training data for Bilingual Embeddings fixes #2167 * 1.36.1 Automatically generated by python-semantic-release * Added training data annotation for e5-base-4k (#2186) * fix: Added training data annotations to MXBAI (#2185) * fix: Update MTEB(Scandinavian) to use new DanFEVER (#2180) This also resolves the missing data in the leaderboard. Fixes #2172 * fix: Added training data annotation for MMLW models (#2188) * Added training data annotation for MMLW models * Added GIST annotations Kenneth missed * Added Stella en 400m training data' * 1.36.2 Automatically generated by python-semantic-release * fix: Added training data for sentence-croissant (#2189) * 1.36.3 Automatically generated by python-semantic-release * fix: update ru models annotation (#2181) * 1.36.4 Automatically generated by python-semantic-release * fix: Alphabetical ordering of tasks in dropdowns (#2191) * 1.36.5 Automatically generated by python-semantic-release * misc: Speed up qrel creation in any2anyretrieval (#2196) * use numpy vectorized operations instead of row-by-row * scores are int * use 'mteb.MTEB' instead of 'MTEB' for custom model (#2199) * add base models for e5 (#2183) * add similar datasets (#2205) * add similar datasets * add nano * update is filled * Update mteb/abstasks/TaskMetadata.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * add labse annotation (#2182) * add labse annotation * Update mteb/models/sentence_transformers_models.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * fix: Fixed leaderboard crash (#2221) * Fixed leaderboard crash * Fixed language selection error * Ran linting * 1.36.6 Automatically generated by python-semantic-release * fix: More training data annotations (#2220) * Added training data annotation for bge-gemma * Added missing annotations for Voyage models * Added training data for sts-multilingual-mpnet * Added all mteb datasets to STS-multilingual training data * 1.36.7 Automatically generated by python-semantic-release * Add LLM2CLIP (OpenAI variants) (#2222) * model loading and get_text_embeddings * add image_emb, fused_emb, and calc probs methods * add b16 model * add llm2clip_openai_l_14_224 (not working yet) * got llm2clip_openai_l_14_224 working * make lint * add training sets and allow py files * Change `dataset on HF` test to use official api (#2213) * refactor dataset checking * increase timeout * increase timeout * remove timeout * Descriptive stats functions for Any2AnyMC and ImageTextPC (#2197) * Add Any2AnyMC descriptive stats * Add descriptive stats function for ImageTextPC * add descriptive stats examples * linter * update multi choice descriptive stats * Update tasks table * fix: Add training data annotations to uderver-bloom models (#2210) * fix: Add training data annotations to uderver-bloom models fixes #2193 * fix: add mixedbread --------- Co-authored-by: Márton Kardos <power.up1163@gmail.com> * 1.36.8 Automatically generated by python-semantic-release * Add comment to `voyage-3-m-exp` model (#2229) * remove model size from voyage-3-m-exp model * Update mteb/models/voyage_models.py * Update mteb/models/voyage_models.py * docs: Update description of EURLex (#2231) * Automatically add similar tasks to training_tasks (#2228) * refactor dataset checking * increase timeout * increase timeout * remove timeout * start * automatically find datasets * update comment * fix aggregate task metadata * fixes * lint * rename * update fetch check * Remove overlapping legends from radar chart (#2195) * Remove overlapping legends from radar chart * ensure graph is not blocked * Overlapping legend issue of Radar Chart * misc: Run Any2AnyRetrieval descriptive stats (#2223) * run a few datasets * add a few more * run more tasks * add more datasets * remove pdb * remove newline * add more datasets * Update tasks table * misc: Add rest of the vision centric and compositionality descriptive stats (#2267) add the rest * Update tasks table * Fix `calculate_memory_usage_mb` in adding_a_model.md (#2271) * Add Arabic-Triplet-Matryoshka-V2 model metadata to MTEB (#2270) * Add Arabic-Triplet-Matryoshka-V2 model metadata to MTEB * Update memory_usage_mb with correct calculated value * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * remove comments * added correct memory usage * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Apply linter fixes with ruff * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Add Arabic_Triplet_Matryoshka_V2 to overview.py * Rename model file to ara_models.py and update imports --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix: Add WebFAQ Retrieval dataset (#2236) * Add WebFAQ Retrieval dataset Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Small change WebFAQRetrieval.py Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Add remaining languages to WebFAQ Retrieval task Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Add descriptive stats Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> --------- Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Update tasks table * 1.36.9 Automatically generated by python-semantic-release * fix: Formatting issue in Performance Plot (#2237) * Formatting issue in Performance Plot * make lint * added function for better code readability * 1.36.10 Automatically generated by python-semantic-release * ci: run test_dataset_on_hf separately (#2201) * dont run test_dataset_on_hf in every pr * lint * Update call pytest test_datasets Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.co…

silky1708 added the maeb Audio extension label Mar 10, 2025

silky1708 linked an issue Mar 10, 2025 that may be closed by this pull request

Add GTZAN Genre dataset #2241

Closed

silky1708 requested review from Samoed and isaac-chung March 10, 2025 23:20

Samoed approved these changes Mar 11, 2025

View reviewed changes

isaac-chung reviewed Mar 11, 2025

View reviewed changes

silky1708 added 2 commits March 13, 2025 13:35

add GTZANGenre dataset

e93ab36

reupload gtzan dataset

fda7ac1

silky1708 force-pushed the maeb branch from aafc000 to fda7ac1 Compare March 13, 2025 20:36

silky1708 requested a review from isaac-chung March 13, 2025 20:37

silky1708 added 3 commits March 13, 2025 13:42

update TaskMetadata from mteb:maeb

99c694e

make pr

ceb3d2b

add task subtype

e84a4e1

silky1708 self-assigned this Mar 13, 2025

isaac-chung reviewed Mar 13, 2025

View reviewed changes

update date for gtzan dataset

335ea1a

update ruff to 0.9.7; make lint

050e5f8

isaac-chung approved these changes Mar 13, 2025

View reviewed changes

isaac-chung merged commit ef30e3d into embeddings-benchmark:maeb Mar 13, 2025
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding GTZAN Genre dataset#2307

adding GTZAN Genre dataset#2307
isaac-chung merged 7 commits intoembeddings-benchmark:maebfrom
silky1708:maeb

silky1708 commented Mar 10, 2025

Uh oh!

isaac-chung left a comment

Uh oh!

Uh oh!

isaac-chung Mar 11, 2025

Uh oh!

isaac-chung Mar 13, 2025

Uh oh!

silky1708 Mar 13, 2025

Uh oh!

Uh oh!

silky1708 commented Mar 13, 2025

Uh oh!

isaac-chung left a comment

Uh oh!

silky1708 commented Mar 13, 2025

Uh oh!

Samoed commented Mar 13, 2025 •

edited

Loading

Uh oh!

isaac-chung commented Mar 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

silky1708 commented Mar 10, 2025

Code Quality

Documentation

Testing

Adding datasets checklist

Adding a model checklist

Uh oh!

isaac-chung left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

isaac-chung Mar 11, 2025

Choose a reason for hiding this comment

Uh oh!

isaac-chung Mar 13, 2025

Choose a reason for hiding this comment

Uh oh!

silky1708 Mar 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

silky1708 commented Mar 13, 2025

Uh oh!

isaac-chung left a comment

Choose a reason for hiding this comment

Uh oh!

silky1708 commented Mar 13, 2025

Uh oh!

Samoed commented Mar 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

isaac-chung commented Mar 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Samoed commented Mar 13, 2025 •

edited

Loading