adding GTZAN Genre dataset#2307
Conversation
isaac-chung
left a comment
There was a problem hiding this comment.
A few comments - This HF dataset is missing one sample. See comments for details.
| eval_splits=["train"], | ||
| eval_langs=["eng-Latn"], | ||
| main_score="accuracy", | ||
| date=("2023-06-23", "2023-06-23"), |
There was a problem hiding this comment.
See linked github README in the issue:
The files were collected in 2000-2001 from a variety of sources including personal CDs, radio, microphone recordings, in order to represent a variety of recording conditions.
There was a problem hiding this comment.
Let's follow the README and use 2000-2001?
There was a problem hiding this comment.
oh, I thought the dates were when dataset is uploaded to hf -- fixed to 2000-2001
isaac-chung
left a comment
There was a problem hiding this comment.
Looks good! All that's remaining is the date and linting. Tests seem to be passing here.
|
I think lint is failing, because recently |
|
Hmm might also want to check your |
* Started the following: - define audio encoder interface - implement abstask and evaluator for clustering * Minor changes and linted files. #2093 * Minor changes and linted files. #2093 * Minor changes and linted files. #2093 * Refs #2068: Initial Implementation of audio-text retrieval abstask and evaluator * Added MockAudioClustering task + MockAudioEncoder for testcase Created test_maeb_datasets.py to test AbsTask and Evaluator for clustering * MockAudioClustering + MockAudioEncoder (#2093) * Added wav2vec model wrapper * Added subTask with small sample of dataset for testing * Added four w2v variants * Update wav2vec_models.py * Added wav2vec (5), wavlm (7), and whisper (5) models * Added revisions from HF to wav2vec models, added silhouette score, DBSCAN and agglomerative algorithms into clustering evaluator, added algorithm selector into VoiceGender * Update mteb/models/wavlm_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * setting up colab * added a2a * PCA + hidden layer + shuffling * New task: emotion clustering * Added qwen2 model * Added Wav2Vec model, voice clustering task, VoxCeleb dataset subset (#2175) * Added wav2vec model wrapper * Added four w2v variants * Update wav2vec_models.py * Removed run.py test script * Added subTask with small sample of dataset for testing * Removed test portion of VoiceGender.py task * add commit hash and bibtex * make lint * update models * fix circular import * make VoiceGender discoverable in get_tasks * add a2a as category for clustering * specify latest commit hash * revert linting changes * Based on feedback for model: updated w2v2 revisions and added torchaudio to .toml file * Added Bibtex for dataset, set data to be test instead of training, shortened task_subtype * Changed task from Voice Gender Clustering to Gender Clustering. * Fixed mock audio clustering tests * Added dataset metadata * Linted * Passed revision into the w2v2 loader * passed lint check * Linted * Update VoiceGender.py --------- Co-authored-by: Ali Sartaz Khan <alisartazkhan@gmail.com> Co-authored-by: Ali Sartaz Khan <71156712+alisartazkhan@users.noreply.github.com> Co-authored-by: mn <mn@Ms-MacBook-Pro.local> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Revert "Maeb - added voice clustering task, wav2vec model and VoxCeleb dataset (subset)" (#2202) * Revert "Revert "Maeb - added voice clustering task, wav2vec model and VoxCeleb dataset (subset)"" (#2203) Revert "Revert "Maeb - added voice clustering task, wav2vec model and VoxCele…" This reverts commit ee10191f1d4f10f705a595062cc04d749f9a2dc3. * Revert "Revert "Revert "Maeb - added voice clustering task, wav2vec model and VoxCeleb dataset (subset)""" (#2207) Revert "Revert "Revert "Maeb - added voice clustering task, wav2vec model and…" This reverts commit f1449c07f8f6c8a084daab572515f3110cd67027. * Add Audio (Multi Label) Classification Abstask, Baseline Audio model, FSD50k Dataset and Task (#2082) * init audio * some encoder related changes * some more abs task defs Co-authored-by: rahulschand <rahulsc@stanford.edu> * evaluators and classification * remove rahul changes to generate first PR * make lint * add dataset/tasks skeleton * readd changes lost in rebase * add fsd50k * add task categories for audio * slight updates to fsd50k * make lint * wav2vec2 model * add fsd50k metadata * rename folder * add metric * add torchaudio in req * reigster wav2vec2 models * fixes * add audio in valid task types * mock interface changes * make lint * rm audio clustering * wav2vec2 model revision update * rm comment * rm test.py * add revisions to all wav2vec2 models * rm empty abstask files * rm empty evaluator files * rm empty task files * Update tests/test_tasks/test_all_abstasks.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Update mteb/models/wav2vec2_models.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * rm non-logReg evaluators for audio classification * lint * fn name changed to convert_audio_from_numpy * rm mock tests for audio kNN classification * rm evaluators for audio kNN classification * fix imports * fix audio kNN; make lint * rm AbsTaskAudioClassification.py for later PR * remove commented code; reset changes to ClassificationEvaluator.py * fix mock tasks for multilabel classification * make lint * inherit Wrapper class * add all languages supported by wav2vec2 * make lint * add script info to all languages * make lint * recent changes * merge wav2vec2 + add updated logic for auto padding for fqd50k type datasets * make lint remove uwanted files * remove debug lines * remove esc50 refs * fix mock tasks for multilabel * fix mock tasks for multilabel * Revert "Merge branch 'maeb' into maeb" bad direct commit made to upstream maeb branch https://github.com/embeddings-benchmark/mteb/commit/4f23fdf27e3263188fe89031d186b293a279153e This reverts commit 4f23fdf27e3263188fe89031d186b293a279153e, reversing changes made to 130247730a07ad1fd2d06a878b7c9bb15af07af7. * fix model imports * fqd50k cleaning * update fsd50k * change dataset * eval subsets correctly * make lint and remove debug statements * clean print statements * make lint * update fsd2019 dataset * remove init in AbsTaskAudioMultilabelClassification.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * add class parameters in AbsTaskAudioMultilabelClassification * inherit from multilingualtask for FSD2019Kaggle * make lint * update mock_tasks; make lint * remove train_split from fn parameters * define fsd2019k to be multilingual * inherit from MultilingualTask in fsd2019K * fix tests * inherit correct multingial task class * remove MockAudioMultilabelClassificationLogRegTask * rm other instances of MockAudioMultilabelClassificationLogRegTask --------- Co-authored-by: rahulschand <rahulsc@stanford.edu> Co-authored-by: Silky Singh <silky1708@gmail.com> Co-authored-by: Silky Singh <54901747+silky1708@users.noreply.github.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Add ESC50 and zero-shot classification (#2133) * init audio * some encoder related changes * some more abs task defs Co-authored-by: rahulschand <rahulsc@stanford.edu> * evaluators and classification * remove rahul changes to generate first PR * make lint * init audio * some encoder related changes * some more abs task defs Co-authored-by: rahulschand <rahulsc@stanford.edu> * evaluators and classification * remove rahul changes to generate first PR * make lint * add dataset/tasks skeleton * readd changes lost in rebase * add fsd50k * add task categories for audio * slight updates to fsd50k * make lint * wav2vec2 model * add fsd50k metadata * rename folder * add metric * add torchaudio in req * reigster wav2vec2 models * fixes * add audio in valid task types * mock interface changes * my 0 shot * make lint * rm audio clustering * wav2vec2 model revision update * rm comment * rm test.py * add revisions to all wav2vec2 models * rm empty abstask files * rm empty evaluator files * rm empty task files * Update tests/test_tasks/test_all_abstasks.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Update mteb/models/wav2vec2_models.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * rm non-logReg evaluators for audio classification * lint * fn name changed to convert_audio_from_numpy * rm mock tests for audio kNN classification * rm evaluators for audio kNN classification * fix imports * fix audio kNN; make lint * rm AbsTaskAudioClassification.py for later PR * added zero-shot loading model and dataset checked * remove commented code; reset changes to ClassificationEvaluator.py * fix mock tasks for multilabel classification * make lint * inherit Wrapper class * add all languages supported by wav2vec2 * make lint * add script info to all languages * make lint * before cleaning comments * ESC and clap model. Tested 81 percent zero-shot numbers * fixed label names for ESC50-multilabel and removed comments * recent changes * merge wav2vec2 + add updated logic for auto padding for fqd50k type datasets * make lint remove uwanted files * remove debug lines * remove esc50 refs * changes for debugging * lint changes and maeb main branch merge * fix mock tasks for multilabel * fix mock tasks for multilabel * Revert "Merge branch 'maeb' into maeb" bad direct commit made to upstream maeb branch https://github.com/embeddings-benchmark/mteb/commit/4f23fdf27e3263188fe89031d186b293a279153e This reverts commit 4f23fdf27e3263188fe89031d186b293a279153e, reversing changes made to 130247730a07ad1fd2d06a878b7c9bb15af07af7. * fix model imports * fqd50k cleaning * fixed error in Image zero shot classfification * update fsd50k * change dataset * eval subsets correctly * make lint and remove debug statements * clean print statements * make lint * update fsd2019 dataset * remove init in AbsTaskAudioMultilabelClassification.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * add class parameters in AbsTaskAudioMultilabelClassification * inherit from multilingualtask for FSD2019Kaggle * make lint * update mock_tasks; make lint * remove train_split from fn parameters * define fsd2019k to be multilingual * inherit from MultilingualTask in fsd2019K * fix tests * inherit correct multingial task class * remove MockAudioMultilabelClassificationLogRegTask * rm other instances of MockAudioMultilabelClassificationLogRegTask * removed unncessary files * removed unncrssary files * removed uncrssary files part 3 * deleted esc50 from multi label classification * fixed errors * fixed lintng, added precision and recall. Removed extra comments * fixed double loading of model * filled in missing meta-data * fixed linting --------- Co-authored-by: Animesh Jha <jha.animesh01@gmail.com> Co-authored-by: rahulschand <rahulsc@stanford.edu> Co-authored-by: Silky Singh <silky1708@gmail.com> Co-authored-by: Silky Singh <54901747+silky1708@users.noreply.github.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Add unfused clap model for zero-shot (#2269) * added unfused model * fixed lint * Add new and complete version of FSD50K multi-label audio classification task (#2285) * Added fsd50k dataset on huggingface * added correct hf version of fsd50k dataset * added correct hf version of fsd50k dataset * removed extra imports * removed unecessary load_data fn * added large, music and speech clap models (#2284) * added large, music and speech clap models * fixed public_training_data and removed training_datasets split * added latest revision * lowercase mit license * fixed issue related to training_datasets * fixed lint * add AbsTaskAudioClassification, ESC50 & GunshotTriangulation datasets (#2287) * add AbsTaskAudioClassification, and ESC50, GunshotTriangulation datasets * rm CrossFold abstask for classification * replace class parameter to is_cross_validation * new function eval_subset_cross_validation in AbsTaskAudioClassification * sync maeb branch with mteb:maeb * Add NSynth dataset (#2306) * add AbsTaskAudioClassification, and ESC50, GunshotTriangulation datasets * rm CrossFold abstask for classification * replace class parameter to is_cross_validation * new function eval_subset_cross_validation in AbsTaskAudioClassification * sync maeb branch with mteb:maeb * add NSynth dataset * update TaskMetadata * import correct Nsynth * add domain music * rm print stmt; update metadata; make pr * update precise dataset size * Add urbansound8k for zero-shot (#2292) * added urbansound8k and tested * lint change * changed dataset type and label * removed left out comments * fixed lint * Add Emotion classification Ravdess dataset (#2320) * added Ravdess dataset * fixed lint * fixed PR comments, added description, changed name * [MAEB] main merge (#2341) * misc: Add image classification descriptive stats implementation (#2045) * add ImageClassificationDescriptiveStatistics * add MNIST descriptive stats * use tuples instead * add label count and update docstrings * update MNIST example * Update tasks table * fix: Add column descriptions to leaderboard (#2039) * fix: Add column descriptions to leaderboard * removed existing overlap * fix: Add BRIGHT (long) and fix bug in TaskResult.filter_and_validate() (#2041) * fix: Add BRIGHT Long Fixes #1978 * fix: Add BRIGHT(long) * fix bug in task results * updated bright * updated tests for TaskResults * 1.34.12 Automatically generated by python-semantic-release * misc: Add image clustering descriptive stats implementation (#2057) * add image clustering descirptive stats and run * finish off last one * remove script * fix: Update embed_dim for jina models (#2058) see https://github.com/embeddings-benchmark/results/pull/117 * Update tasks table * 1.34.13 Automatically generated by python-semantic-release * Add giga embeddings (#1741) * add gigaembeddings * use jasper * fix name * create sentence_transformer instruct wrapper * apply instruction template * fix jasper * update meta * misc: Add ZS and multilabel image classification descriptive stats implementation (#2059) * add image clustering descirptive stats and run * finish off last one * remove script * add ImageMultilabelClassificationDescriptiveStatistics * add VOC2007 * add zeroshot and mnist example * Update tasks table * Rename MIEB task classes with duplicated names (#2061) fix class names * misc: Add VisualSTS descriptive stats (#2062) * add visualsts stats * add last dataset * Update tasks table * fix: Added gte models (#1539) * fix: Added gte models * fix: Add mixbai models (#1540) for #1515 * fix: Add climate fever v2 (#1873) * Updated ClimateFEVER dataset with new version * Adds Fill in the empty metadata. * Updates the date tuple * Update class name Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update domains Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update task_subtypes * Update annotations_creators for the first version * Update date Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update task subtypes * Update path * Update description --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Mina Parham <minaparham@Keatext.local> * Update tasks table * fix: Updating paper scripts (#1958) * change reference revisions to align with paper * Update author list * Added code for main results table * updated minor changes * added external as a "no_revision_available" case * revert unintended changes * format * 1.34.14 Automatically generated by python-semantic-release * Add datasets for a benchmark newly introduced for "Engineering" domain (#1911) * adding clustering tasks (built-bench-clustering S2S & P2P) * updated built-bench-clustering tasks * Updated BuiltBenchClustering tasks * Added "Engineering" as new domain to TaskMetadata.py * Updated tasks table in docs * Updated task metadata for BuiltBenchClustering S2S and P2P * updated metadata for clustering tasks * Add/update BuiltBench tasks - Add BuiltBenchRetrieval task - Add BuiltBenchReranking task - Update metadata for BuiltBenchClusterinP2P - Update metadata for BuiltBenchClusterinS2S * update BuiltBench benchmark * Update mteb/benchmarks/benchmarks.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Clustering/eng/BuiltBenchClusteringS2S.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Clustering/eng/BuiltBenchClusteringP2P.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/benchmarks/benchmarks.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Fix formatting via ruff --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Update tasks table * misc: update model names to adjust for adding to results repo (#2074) * update model names to adjust for adding to results repo * update model meta script * misc: Add all image classification descriptive stats (#2073) * add most image classification descr stats * revert changes to encoder * add stats --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * Update tasks table * ci: Rerun tests that fail due to networking issues. (#2029) * fix: rerun tests that fail - Networking * update tests to use tmp_path * set versions for dev dependencies * add pytest options to pyproject.toml * add rerun json.decoder.JSONDecodeError * remove JSONDecodeError from pyproject.toml * add huggingface_hub.errors.HfHubHTTPError * add huggingface_hub.errors.LocalEntryNotFoundError https://github.com/embeddings-benchmark/mteb/actions/runs/13298535701/job/37139767443?pr=2044 * FileNotFoundError https://github.com/embeddings-benchmark/mteb/actions/runs/13302915091/job/37147507251?pr=2029 * add doc to pytest rerun --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> * fix: generate metadata (#2063) * fix: generate metadata * use logging not print for script * lint * add iso639 to dev pyproject * fix import * add memory_usage_mb * set version for iso639 Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * 1.34.15 Automatically generated by python-semantic-release * fix: add missing `e5` training datasets (#2065) add missing training datasets * 1.34.16 Automatically generated by python-semantic-release * fix: Ensure voyage model uses different naming scheme (#2083) * fix: Added make command for running leaderboard locally * fix: Ensure voyage models doesn't re-use the name * 1.34.17 Automatically generated by python-semantic-release * fix: Freeze model/rank columns in leaderboard (#2044) * fix: freeze model/rank columns in leaderboard * freezing zero-shot column * update min gradio version to 5.16.0 in pyproject.toml --------- Co-authored-by: Shikhar Shiromani <sshiromani@sshiromani-mlt.client.nvidia.com> * 1.34.18 Automatically generated by python-semantic-release * fix: Fixed previous incorrect specification of splits for CMTEB ( MTEB(cmn, v1) ) (#2086) Fixes #2064 * 1.34.19 Automatically generated by python-semantic-release * Remove duplicated string in docstring of TaskMetadata class (#2087) * Remove duplicated string in docstring of TaskMetadata class * Remove duplicated dataset field * fix: Smarter leaderboard caching with cachetools (#2085) * Added smarter caching to callbacks * Added cachetools as a dependency * Ran linting * Removed debugging print statement * Bumped Gradio version * Dependency fixes * Dependency fixes --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * fix: Missing fixes for #2086 - change MultilingualSentiment split from test to validation in CMTEB (#2088) * fix: Fixed previous incorrect specification of splits for CMTEB ( MTEB(cmn, v1) ) Fixes #2064 * change MultilingualSentiment split from test to validation in CMTEB * 1.34.20 Automatically generated by python-semantic-release * merge gme models (#2089) * fix: Add back task filtering by modalities (#2080) * add back task filtering by modalities * add unit test * check if task modalities is a subset of model modalities and fix tests * add model_modalities_more_than_task_modalities case * 1.34.21 Automatically generated by python-semantic-release * Added gtr-t5-base/large/xl/xxl metadata to mteb (#2092) * Added GTR Models to codebase * Linted gtr models file. * Added gtr-base/large/xl/xxl to sentence_transformers_models.py * Added memory_usage_mb and training_datasets * Reformatted training dataset names * Reformatted training dataset names * Reformatted training dataset names --------- Co-authored-by: sufen <sufenf@gmail.com> * misc: Add Any2TextMutipleChoice Descriptive Statistics (#2095) * add Any2TextMutipleChoiceDescriptiveStatistics * run on all tasks * Update tasks table * fix: Updated model annotations for GTE, e5, gritlm, and SFR models (#2101) Reported with references to paper + qoutes. * fix: Update links (#2098) * Fix link * Fix link * 1.34.22 Automatically generated by python-semantic-release * Add model inf-retriever-v1-1.5b (#2106) Add inf-retriever-v1-1.5b model * docs: Fix typos & refine text (#2102) * Update app.py * Fix typos * misc: Run Zeroshot Classification Descriptive Stats (#2105) * add most datasets * add birdsnap and imgnet1k * add scimmir and sun397 * add uck101 zs * Update tasks table * fix: add warning about task category conversion (#2108) add warning about task category conversion * 1.34.23 Automatically generated by python-semantic-release * fix: Add codesage-large-v2 (#2090) * Add codesage-large-v2 * Address comments * Add training dataset * Fix issues * Format code * Remove unnecessary wrapper * 1.34.24 Automatically generated by python-semantic-release * fix: add training data to BGE-m3-custom-fr (#2110) This ensure that is it correctly filtered as non-zero-shot * 1.34.25 Automatically generated by python-semantic-release * fix: Upgrade ruff to be gradio compatible (#2111) * fix: update ruff to be gradio compatible (>=0.9.3) * format * fix: upgrade ruff to latests (same as gradio compatible) * 1.34.26 Automatically generated by python-semantic-release * docs: Follow google docstring format (#2115) Fixes #2113 * Update leaderboard_refresh.yaml (#2121) * fix InstructSentenceTransformer Model name (#2125) fix params * fix voyage (#2127) * fix: update e5 instruct training data (#2129) update e5 training data * 1.34.27 Automatically generated by python-semantic-release * format * Update tasks table * fix: Add 2 new Static Sentence Transformer models (#2112) * Add 2 new Static Sentence Transformer models * Add Tatoeba Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * 1.34.28 Automatically generated by python-semantic-release * add is_cross_encoder (#1869) * add is_cross_encoder * Update mteb/model_meta.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * change value --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Qodo embed 1 1.5 b (#2137) * feat: Add Qodo-Embed-1-1.5B model metadata * fix: Add Qodo models to overview imports * fix: Add adapted_from field to Qodo model metadata * Update mteb/models/qodo_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * relint --------- Co-authored-by: Tal Sheffer <tal.s@codium.ai> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * misc: merge summary retrieval into bitext mining (#2140) merge summary retrieval into bitext mining * test: fix dataset availability test (#2141) This simplified the test and also make it a lot simpler. It also removed about 100 test cases which where all to the same API call. * fix: Update NVIDIA-Embed training data (#2143) Added a few missing annotations for nvidia-embed * 1.34.29 Automatically generated by python-semantic-release * fix: Add annotations for Voyage exp (#2144) * fix: Update NVIDIA-Embed training data Added a few missing annotations for nvidia-embed * fix update annotationf for voyage exp * 1.34.30 Automatically generated by python-semantic-release * Fix tokens num in cde models (#2148) fix tokens * feat: Add Qodo-Embed-1-7B model metadata and rename existing model (#2146) * feat: Add Qodo-Embed-1-7B model metadata and rename existing model * lint * fix revision * update license name --------- Co-authored-by: Tal Sheffer <tal.s@codium.ai> * 1.35.0 Automatically generated by python-semantic-release * misc: add Any2AnyRetrievalDescriptiveStatistics (#2139) add Any2AnyRetrievalDescriptiveStatistics * Update tasks table * Added zero-shot percentages and different filtering scheme (#2153) * Added zero-shot percentages and different filtering scheme * Update mteb/model_meta.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix: Incorrect annotations for Mistral-based embedding models (#2157) Fixes #2155 * 1.35.1 Automatically generated by python-semantic-release * Update FaMTEBRetrieval.py (#2171) The URL pointed to the settings page instead of the main repo URL. Now it is fixed. * Update tasks table * fix: Add Training data annotations (#2173) * redo to voyage to only training data * Add training data annotation for Kalm embeddings #2168 * Add correct training data annotations to Stella #2164 * removed fiqa PL as it does not exist * remove ArxivClusteringS2S.v2 as it does not exist * Add training data annotation for GIST embedding #2166 * fix max tokens for kalm models #2162 * remove eli 5 * 1.35.2 Automatically generated by python-semantic-release * feat: Add MIEB and MIEB-lite as benchmarks (#2035) * add mieb and mieb-lite to benchmarks * add CompositionalityEvaluation and DocumentUnderstanding types * add VisionCentric type * add missing comma * split STS17MultilingualVisualSTS and STSBenchmarkMultilingualSTS to eng and non-eng * use aggregate task instead so we can name the subsets * shorten names * fix import * alternative strategy to avoid using get_task * follow other aggregate tasks and skip metadata test * run LB without errors when selecting MIEB(-lite) * add back the capability as taks type * typo * extend description * split into mieb(eng) and mieb(multilingual) * remove unneeded files * remove aggtask additions for test * edit descriptions based on screenshots * shorten * rename to Compositionality and include ImageCoDeT2IMultiChoice * re-tag missing VisionCentric tasks * re-tag rparis and roxford as retrieval and include fixes * re-tag voc2007 as image cls * make lint * correct num task types in descriptions * add one model to models_to_annotate * add mieb reference models * update task types * relabel to multilingual retrieval task type to align with paper * fix reference and bibtex * edit task list to match with final list * add back agg task to reproduce table column in paper * fix filtering and import * update tests * mieb lite add back missing tasks * fix metadata test * multi should have all 4 variants * fix task counts * lite has 10 task types * fix visualSTS-17 lang splits * Aggregate task can now use subsets & eval langs to filter TaskResults * fix test and mark VisualSTS17 as multilingual * fix tests * add agg task running script * add voyage meta * fix citations * capitalize * add coarse/fine labels --------- Co-authored-by: gowitheflow-1998 <jsbs54@durham.ac.uk> * Update tasks table * 1.36.0 Automatically generated by python-semantic-release * fix: update training datasets and revision for jina models (#2179) * feat: update training datasets and revision for jina models * feat: update training datasets and revision for jina models * fix: Add more training data annotations (#2178) * redo to voyage to only training data * Add training data annotation for Kalm embeddings #2168 * Add correct training data annotations to Stella #2164 * removed fiqa PL as it does not exist * remove ArxivClusteringS2S.v2 as it does not exist * Add training data annotation for GIST embedding #2166 * fix max tokens for kalm models #2162 * remove eli 5 * fix: add training data for Bilingual Embeddings fixes #2167 * 1.36.1 Automatically generated by python-semantic-release * Added training data annotation for e5-base-4k (#2186) * fix: Added training data annotations to MXBAI (#2185) * fix: Update MTEB(Scandinavian) to use new DanFEVER (#2180) This also resolves the missing data in the leaderboard. Fixes #2172 * fix: Added training data annotation for MMLW models (#2188) * Added training data annotation for MMLW models * Added GIST annotations Kenneth missed * Added Stella en 400m training data' * 1.36.2 Automatically generated by python-semantic-release * fix: Added training data for sentence-croissant (#2189) * 1.36.3 Automatically generated by python-semantic-release * fix: update ru models annotation (#2181) * 1.36.4 Automatically generated by python-semantic-release * fix: Alphabetical ordering of tasks in dropdowns (#2191) * 1.36.5 Automatically generated by python-semantic-release * misc: Speed up qrel creation in any2anyretrieval (#2196) * use numpy vectorized operations instead of row-by-row * scores are int * use 'mteb.MTEB' instead of 'MTEB' for custom model (#2199) * add base models for e5 (#2183) * add similar datasets (#2205) * add similar datasets * add nano * update is filled * Update mteb/abstasks/TaskMetadata.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * add labse annotation (#2182) * add labse annotation * Update mteb/models/sentence_transformers_models.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * fix: Fixed leaderboard crash (#2221) * Fixed leaderboard crash * Fixed language selection error * Ran linting * 1.36.6 Automatically generated by python-semantic-release * fix: More training data annotations (#2220) * Added training data annotation for bge-gemma * Added missing annotations for Voyage models * Added training data for sts-multilingual-mpnet * Added all mteb datasets to STS-multilingual training data * 1.36.7 Automatically generated by python-semantic-release * Add LLM2CLIP (OpenAI variants) (#2222) * model loading and get_text_embeddings * add image_emb, fused_emb, and calc probs methods * add b16 model * add llm2clip_openai_l_14_224 (not working yet) * got llm2clip_openai_l_14_224 working * make lint * add training sets and allow py files * Change `dataset on HF` test to use official api (#2213) * refactor dataset checking * increase timeout * increase timeout * remove timeout * Descriptive stats functions for Any2AnyMC and ImageTextPC (#2197) * Add Any2AnyMC descriptive stats * Add descriptive stats function for ImageTextPC * add descriptive stats examples * linter * update multi choice descriptive stats * Update tasks table * fix: Add training data annotations to uderver-bloom models (#2210) * fix: Add training data annotations to uderver-bloom models fixes #2193 * fix: add mixedbread --------- Co-authored-by: Márton Kardos <power.up1163@gmail.com> * 1.36.8 Automatically generated by python-semantic-release * Add comment to `voyage-3-m-exp` model (#2229) * remove model size from voyage-3-m-exp model * Update mteb/models/voyage_models.py * Update mteb/models/voyage_models.py * docs: Update description of EURLex (#2231) * Automatically add similar tasks to training_tasks (#2228) * refactor dataset checking * increase timeout * increase timeout * remove timeout * start * automatically find datasets * update comment * fix aggregate task metadata * fixes * lint * rename * update fetch check * Remove overlapping legends from radar chart (#2195) * Remove overlapping legends from radar chart * ensure graph is not blocked * Overlapping legend issue of Radar Chart * misc: Run Any2AnyRetrieval descriptive stats (#2223) * run a few datasets * add a few more * run more tasks * add more datasets * remove pdb * remove newline * add more datasets * Update tasks table * misc: Add rest of the vision centric and compositionality descriptive stats (#2267) add the rest * Update tasks table * Fix `calculate_memory_usage_mb` in adding_a_model.md (#2271) * Add Arabic-Triplet-Matryoshka-V2 model metadata to MTEB (#2270) * Add Arabic-Triplet-Matryoshka-V2 model metadata to MTEB * Update memory_usage_mb with correct calculated value * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * remove comments * added correct memory usage * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Apply linter fixes with ruff * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Add Arabic_Triplet_Matryoshka_V2 to overview.py * Rename model file to ara_models.py and update imports --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix: Add WebFAQ Retrieval dataset (#2236) * Add WebFAQ Retrieval dataset Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Small change WebFAQRetrieval.py Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Add remaining languages to WebFAQ Retrieval task Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Add descriptive stats Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> --------- Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Update tasks table * 1.36.9 Automatically generated by python-semantic-release * fix: Formatting issue in Performance Plot (#2237) * Formatting issue in Performance Plot * make lint * added function for better code readability * 1.36.10 Automatically generated by python-semantic-release * ci: run test_dataset_on_hf separately (#2201) * dont run test_dataset_on_hf in every pr * lint * Update call pytest test_datasets Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update tests/test_tasks/test_all_abstasks.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * not datasets for test * run dataset loading test for push or pull_request * apply feedback --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * add gemini-embedding-exp-03-07 (#2279) * add gemini-embedding-exp-03-07 * remove space for lint * lint fix * update link (#2281) * fix: Run remaining MIEB desc stats (#2288) * run Vidore * GLDv2 * run the rest --------- Co-authored-by: Isaac Chung <isaac@hn496lf4f9.lan> * Update tasks table * 1.36.11 Automatically generated by python-semantic-release * fix: Added Filter Modality (#2262) * Added Filter Modality * resolve suggestions * make lint * make sure test pass * make lint * added exclusive_modality_filter and unit tests * Integrate tests on overview.py * Update tests/test_overview.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * added task related to image modality * Update mteb/abstasks/AbsTask.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update mteb/abstasks/AbsTask.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * update overview..py * make lint * update documentation --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * 1.36.12 Automatically generated by python-semantic-release * fix: Add `ModelMeta` license & custom validations (#2293) * license validation * move licenses * update imports --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * 1.36.13 Automatically generated by python-semantic-release * ci: Add pre-commit hook (#2194) * make dev life nicer - pre-commit hooks * add pre-commit to install * update precommit * update ruff pre-commit * lint * lint --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> * Update tasks table * fix: bug in voyage implementation (#2304) * fix: Fix bug in voyage implementation "passage" is not a valid input for the voyage API. Remapped to "document". * Update mteb/models/voyage_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * 1.36.14 Automatically generated by python-semantic-release * fix: Update voyage name to include Org. (#2322) * 1.36.15 Automatically generated by python-semantic-release * Added VDR Model (#2290) * Added VDR Model * change custom wrapper to SentenceTransformer Wrapper * remove kwargs and add TODO for Image Modality * Update mteb/models/vdr_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix: Resolve conflicting dependencies (#2323) These errors where discovered when trying to install the package using `uv`. We have a problem with salesforce-lavis, which is not compatible with the current set of dependencies. * 1.36.16 Automatically generated by python-semantic-release * fix: remove SyntaxWarnings in py312 (#2325) * fix: Resolve conflicting dependencies These errors where discovered when trying to install the package using `uv`. We have a problem with salesforce-lavis, which is not compatible with the current set of dependencies. * fix: Remove syntax warnings occuring in python 3.12 ``` Python 3.12.0 (main, Oct 2 2023, 20:56:14) [Clang 16.0.3 ] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import mteb # no syntax warnings >>> ``` * 1.36.17 Automatically generated by python-semantic-release * fix: add annotation models for stella zh (#2277) * fix: add annotation models for stella zh Additionally fixed a few annotation errors * format * Update mteb/models/stella_models.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * 1.36.18 Automatically generated by python-semantic-release * fix: Add ModelMeta rubert-mini-frida, BERTA (#2330) * Add rubert-mini-frida model meta * Add BERTA model meta * docs: fix typos * 1.36.19 Automatically generated by python-semantic-release * fix: Add WebFAQ bitext mining tasks (#2326) * Add WebFAQ bitext mining tasks Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Lower number of language pairs in WebFAQBitextMining Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> --------- Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Update tasks table * 1.36.20 Automatically generated by python-semantic-release * make lint * fix validation for license * fix remaining validation errors --------- Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: github-actions <github-actions@github.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Mina Parham <36207068+mina-parham@users.noreply.github.com> Co-authored-by: Mina Parham <minaparham@Keatext.local> Co-authored-by: Mehrzad Shahin-Moghadam <42153677+mehrzadshm@users.noreply.github.com> Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> Co-authored-by: Sam <40773225+sam-hey@users.noreply.github.com> Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> Co-authored-by: Shikhar Shiromani <rbk.shikhar@gmail.com> Co-authored-by: Shikhar Shiromani <sshiromani@sshiromani-mlt.client.nvidia.com> Co-authored-by: Ruslan Bel'kov <ruslan.belckov@yandex.ru> Co-authored-by: Márton Kardos <power.up1163@gmail.com> Co-authored-by: sufen-f <sufenfong@gmail.com> Co-authored-by: sufen <sufenf@gmail.com> Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> Co-authored-by: Samuel Yang <samuelyang150@gmail.com> Co-authored-by: Aradhye Agarwal <aradhyeagarwal@gmail.com> Co-authored-by: Tom Aarsen <37621491+tomaarsen@users.noreply.github.com> Co-authored-by: talshef <tsheffer@gmail.com> Co-authored-by: Tal Sheffer <tal.s@codium.ai> Co-authored-by: garciasces <garciasces@madrid.es> Co-authored-by: gowitheflow-1998 <jsbs54@durham.ac.uk> Co-authored-by: Wang Bo <bo.wang@jina.ai> Co-authored-by: Munot Ayush Sunil <munotayush6@kgpian.iitkgp.ac.in> Co-authored-by: Yaya Sy <58347382+yaya-sy@users.noreply.github.com> Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com> Co-authored-by: Eng. Omar Najar <79968243+omarnj-lab@users.noreply.github.com> Co-authored-by: Michael Dinzinger <39766249+michaeldinzinger@users.noreply.github.com> Co-authored-by: Jinhyuk Lee <lee.jnhk@gmail.com> Co-authored-by: Isaac Chung <isaac@hn496lf4f9.lan> Co-authored-by: sergeyz-zh <49659999+sergeyz-zh@users.noreply.github.com> * adding GTZAN Genre dataset (#2307) * add GTZANGenre dataset * reupload gtzan dataset * update TaskMetadata from mteb:maeb * make pr * add task subtype * update date for gtzan dataset * update ruff to 0.9.7; make lint * Adding Beijing Opera dataset (#2356) * update TaskMetadata from mteb:maeb * make pr * add task subtype * update ruff to 0.9.7; make lint * add voxlingua107-top10 dataset * add beijing opera dataset * update TaskMetadata from mteb:maeb * make pr * update ruff to 0.9.7; make lint * update TaskMetadata from mteb:maeb * update TaskMetadata * add Mridingham datasets * rm comment * Adding Libricount dataset (#2361) * update taskmetadata * sync mteb:maeb * make lint * Adding Crema-D Dataset for emotion classification [HEAR] (#2368) * update TaskMetadata from mteb:maeb * make pr * add task subtype * update ruff to 0.9.7; make lint * add voxlingua107-top10 dataset * update TaskMetadata * adding CREMA-D dataset * rm deleted files * Adding FSDD dataset (Free Spoken Digit Dataset) (#2371) update Taskmetadata * Add VoxCelebSA, SpokenQAforIC, VehicleSoundClustering from Dynamic-SUPERB (#2379) * add datasets from dynamic-superb * make lint * remove label mapping * apply lint * change eval_split train -> test * fix FSD-50K Task Metadata, Label handling and add stratified subsampling (#2369) fix FSD-50K task * Add music clustering dataset (#2232) * Adds music-genre dataset * Updates revision * Fixes issues * Changes category * Removes a2t from task category * Update the revision * Fixes based on the feedback about converting rate * Remove librosa Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * [MAEB] merge main -> maeb (#2471) * misc: Add image classification descriptive stats implementation (#2045) * add ImageClassificationDescriptiveStatistics * add MNIST descriptive stats * use tuples instead * add label count and update docstrings * update MNIST example * Update tasks table * fix: Add column descriptions to leaderboard (#2039) * fix: Add column descriptions to leaderboard * removed existing overlap * fix: Add BRIGHT (long) and fix bug in TaskResult.filter_and_validate() (#2041) * fix: Add BRIGHT Long Fixes #1978 * fix: Add BRIGHT(long) * fix bug in task results * updated bright * updated tests for TaskResults * 1.34.12 Automatically generated by python-semantic-release * misc: Add image clustering descriptive stats implementation (#2057) * add image clustering descirptive stats and run * finish off last one * remove script * fix: Update embed_dim for jina models (#2058) see https://github.com/embeddings-benchmark/results/pull/117 * Update tasks table * 1.34.13 Automatically generated by python-semantic-release * Add giga embeddings (#1741) * add gigaembeddings * use jasper * fix name * create sentence_transformer instruct wrapper * apply instruction template * fix jasper * update meta * misc: Add ZS and multilabel image classification descriptive stats implementation (#2059) * add image clustering descirptive stats and run * finish off last one * remove script * add ImageMultilabelClassificationDescriptiveStatistics * add VOC2007 * add zeroshot and mnist example * Update tasks table * Rename MIEB task classes with duplicated names (#2061) fix class names * misc: Add VisualSTS descriptive stats (#2062) * add visualsts stats * add last dataset * Update tasks table * fix: Added gte models (#1539) * fix: Added gte models * fix: Add mixbai models (#1540) for #1515 * fix: Add climate fever v2 (#1873) * Updated ClimateFEVER dataset with new version * Adds Fill in the empty metadata. * Updates the date tuple * Update class name Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update domains Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update task_subtypes * Update annotations_creators for the first version * Update date Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update task subtypes * Update path * Update description --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Mina Parham <minaparham@Keatext.local> * Update tasks table * fix: Updating paper scripts (#1958) * change reference revisions to align with paper * Update author list * Added code for main results table * updated minor changes * added external as a "no_revision_available" case * revert unintended changes * format * 1.34.14 Automatically generated by python-semantic-release * Add datasets for a benchmark newly introduced for "Engineering" domain (#1911) * adding clustering tasks (built-bench-clustering S2S & P2P) * updated built-bench-clustering tasks * Updated BuiltBenchClustering tasks * Added "Engineering" as new domain to TaskMetadata.py * Updated tasks table in docs * Updated task metadata for BuiltBenchClustering S2S and P2P * updated metadata for clustering tasks * Add/update BuiltBench tasks - Add BuiltBenchRetrieval task - Add BuiltBenchReranking task - Update metadata for BuiltBenchClusterinP2P - Update metadata for BuiltBenchClusterinS2S * update BuiltBench benchmark * Update mteb/benchmarks/benchmarks.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Clustering/eng/BuiltBenchClusteringS2S.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Clustering/eng/BuiltBenchClusteringP2P.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/benchmarks/benchmarks.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Fix formatting via ruff --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Update tasks table * misc: update model names to adjust for adding to results repo (#2074) * update model names to adjust for adding to results repo * update model meta script * misc: Add all image classification descriptive stats (#2073) * add most image classification descr stats * revert changes to encoder * add stats --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * Update tasks table * ci: Rerun tests that fail due to networking issues. (#2029) * fix: rerun tests that fail - Networking * update tests to use tmp_path * set versions for dev dependencies * add pytest options to pyproject.toml * add rerun json.decoder.JSONDecodeError * remove JSONDecodeError from pyproject.toml * add huggingface_hub.errors.HfHubHTTPError * add huggingface_hub.errors.LocalEntryNotFoundError https://github.com/embeddings-benchmark/mteb/actions/runs/13298535701/job/37139767443?pr=2044 * FileNotFoundError https://github.com/embeddings-benchmark/mteb/actions/runs/13302915091/job/37147507251?pr=2029 * add doc to pytest rerun --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> * fix: generate metadata (#2063) * fix: generate metadata * use logging not print for script * lint * add iso639 to dev pyproject * fix import * add memory_usage_mb * set version for iso639 Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * 1.34.15 Automatically generated by python-semantic-release * fix: add missing `e5` training datasets (#2065) add missing training datasets * 1.34.16 Automatically generated by python-semantic-release * fix: Ensure voyage model uses different naming scheme (#2083) * fix: Added make command for running leaderboard locally * fix: Ensure voyage models doesn't re-use the name * 1.34.17 Automatically generated by python-semantic-release * fix: Freeze model/rank columns in leaderboard (#2044) * fix: freeze model/rank columns in leaderboard * freezing zero-shot column * update min gradio version to 5.16.0 in pyproject.toml --------- Co-authored-by: Shikhar Shiromani <sshiromani@sshiromani-mlt.client.nvidia.com> * 1.34.18 Automatically generated by python-semantic-release * fix: Fixed previous incorrect specification of splits for CMTEB ( MTEB(cmn, v1) ) (#2086) Fixes #2064 * 1.34.19 Automatically generated by python-semantic-release * Remove duplicated string in docstring of TaskMetadata class (#2087) * Remove duplicated string in docstring of TaskMetadata class * Remove duplicated dataset field * fix: Smarter leaderboard caching with cachetools (#2085) * Added smarter caching to callbacks * Added cachetools as a dependency * Ran linting * Removed debugging print statement * Bumped Gradio version * Dependency fixes * Dependency fixes --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * fix: Missing fixes for #2086 - change MultilingualSentiment split from test to validation in CMTEB (#2088) * fix: Fixed previous incorrect specification of splits for CMTEB ( MTEB(cmn, v1) ) Fixes #2064 * change MultilingualSentiment split from test to validation in CMTEB * 1.34.20 Automatically generated by python-semantic-release * merge gme models (#2089) * fix: Add back task filtering by modalities (#2080) * add back task filtering by modalities * add unit test * check if task modalities is a subset of model modalities and fix tests * add model_modalities_more_than_task_modalities case * 1.34.21 Automatically generated by python-semantic-release * Added gtr-t5-base/large/xl/xxl metadata to mteb (#2092) * Added GTR Models to codebase * Linted gtr models file. * Added gtr-base/large/xl/xxl to sentence_transformers_models.py * Added memory_usage_mb and training_datasets * Reformatted training dataset names * Reformatted training dataset names * Reformatted training dataset names --------- Co-authored-by: sufen <sufenf@gmail.com> * misc: Add Any2TextMutipleChoice Descriptive Statistics (#2095) * add Any2TextMutipleChoiceDescriptiveStatistics * run on all tasks * Update tasks table * fix: Updated model annotations for GTE, e5, gritlm, and SFR models (#2101) Reported with references to paper + qoutes. * fix: Update links (#2098) * Fix link * Fix link * 1.34.22 Automatically generated by python-semantic-release * Add model inf-retriever-v1-1.5b (#2106) Add inf-retriever-v1-1.5b model * docs: Fix typos & refine text (#2102) * Update app.py * Fix typos * misc: Run Zeroshot Classification Descriptive Stats (#2105) * add most datasets * add birdsnap and imgnet1k * add scimmir and sun397 * add uck101 zs * Update tasks table * fix: add warning about task category conversion (#2108) add warning about task category conversion * 1.34.23 Automatically generated by python-semantic-release * fix: Add codesage-large-v2 (#2090) * Add codesage-large-v2 * Address comments * Add training dataset * Fix issues * Format code * Remove unnecessary wrapper * 1.34.24 Automatically generated by python-semantic-release * fix: add training data to BGE-m3-custom-fr (#2110) This ensure that is it correctly filtered as non-zero-shot * 1.34.25 Automatically generated by python-semantic-release * fix: Upgrade ruff to be gradio compatible (#2111) * fix: update ruff to be gradio compatible (>=0.9.3) * format * fix: upgrade ruff to latests (same as gradio compatible) * 1.34.26 Automatically generated by python-semantic-release * docs: Follow google docstring format (#2115) Fixes #2113 * Update leaderboard_refresh.yaml (#2121) * fix InstructSentenceTransformer Model name (#2125) fix params * fix voyage (#2127) * fix: update e5 instruct training data (#2129) update e5 training data * 1.34.27 Automatically generated by python-semantic-release * format * Update tasks table * fix: Add 2 new Static Sentence Transformer models (#2112) * Add 2 new Static Sentence Transformer models * Add Tatoeba Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * 1.34.28 Automatically generated by python-semantic-release * add is_cross_encoder (#1869) * add is_cross_encoder * Update mteb/model_meta.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * change value --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Qodo embed 1 1.5 b (#2137) * feat: Add Qodo-Embed-1-1.5B model metadata * fix: Add Qodo models to overview imports * fix: Add adapted_from field to Qodo model metadata * Update mteb/models/qodo_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * relint --------- Co-authored-by: Tal Sheffer <tal.s@codium.ai> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * misc: merge summary retrieval into bitext mining (#2140) merge summary retrieval into bitext mining * test: fix dataset availability test (#2141) This simplified the test and also make it a lot simpler. It also removed about 100 test cases which where all to the same API call. * fix: Update NVIDIA-Embed training data (#2143) Added a few missing annotations for nvidia-embed * 1.34.29 Automatically generated by python-semantic-release * fix: Add annotations for Voyage exp (#2144) * fix: Update NVIDIA-Embed training data Added a few missing annotations for nvidia-embed * fix update annotationf for voyage exp * 1.34.30 Automatically generated by python-semantic-release * Fix tokens num in cde models (#2148) fix tokens * feat: Add Qodo-Embed-1-7B model metadata and rename existing model (#2146) * feat: Add Qodo-Embed-1-7B model metadata and rename existing model * lint * fix revision * update license name --------- Co-authored-by: Tal Sheffer <tal.s@codium.ai> * 1.35.0 Automatically generated by python-semantic-release * misc: add Any2AnyRetrievalDescriptiveStatistics (#2139) add Any2AnyRetrievalDescriptiveStatistics * Update tasks table * Added zero-shot percentages and different filtering scheme (#2153) * Added zero-shot percentages and different filtering scheme * Update mteb/model_meta.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix: Incorrect annotations for Mistral-based embedding models (#2157) Fixes #2155 * 1.35.1 Automatically generated by python-semantic-release * Update FaMTEBRetrieval.py (#2171) The URL pointed to the settings page instead of the main repo URL. Now it is fixed. * Update tasks table * fix: Add Training data annotations (#2173) * redo to voyage to only training data * Add training data annotation for Kalm embeddings #2168 * Add correct training data annotations to Stella #2164 * removed fiqa PL as it does not exist * remove ArxivClusteringS2S.v2 as it does not exist * Add training data annotation for GIST embedding #2166 * fix max tokens for kalm models #2162 * remove eli 5 * 1.35.2 Automatically generated by python-semantic-release * feat: Add MIEB and MIEB-lite as benchmarks (#2035) * add mieb and mieb-lite to benchmarks * add CompositionalityEvaluation and DocumentUnderstanding types * add VisionCentric type * add missing comma * split STS17MultilingualVisualSTS and STSBenchmarkMultilingualSTS to eng and non-eng * use aggregate task instead so we can name the subsets * shorten names * fix import * alternative strategy to avoid using get_task * follow other aggregate tasks and skip metadata test * run LB without errors when selecting MIEB(-lite) * add back the capability as taks type * typo * extend description * split into mieb(eng) and mieb(multilingual) * remove unneeded files * remove aggtask additions for test * edit descriptions based on screenshots * shorten * rename to Compositionality and include ImageCoDeT2IMultiChoice * re-tag missing VisionCentric tasks * re-tag rparis and roxford as retrieval and include fixes * re-tag voc2007 as image cls * make lint * correct num task types in descriptions * add one model to models_to_annotate * add mieb reference models * update task types * relabel to multilingual retrieval task type to align with paper * fix reference and bibtex * edit task list to match with final list * add back agg task to reproduce table column in paper * fix filtering and import * update tests * mieb lite add back missing tasks * fix metadata test * multi should have all 4 variants * fix task counts * lite has 10 task types * fix visualSTS-17 lang splits * Aggregate task can now use subsets & eval langs to filter TaskResults * fix test and mark VisualSTS17 as multilingual * fix tests * add agg task running script * add voyage meta * fix citations * capitalize * add coarse/fine labels --------- Co-authored-by: gowitheflow-1998 <jsbs54@durham.ac.uk> * Update tasks table * 1.36.0 Automatically generated by python-semantic-release * fix: update training datasets and revision for jina models (#2179) * feat: update training datasets and revision for jina models * feat: update training datasets and revision for jina models * fix: Add more training data annotations (#2178) * redo to voyage to only training data * Add training data annotation for Kalm embeddings #2168 * Add correct training data annotations to Stella #2164 * removed fiqa PL as it does not exist * remove ArxivClusteringS2S.v2 as it does not exist * Add training data annotation for GIST embedding #2166 * fix max tokens for kalm models #2162 * remove eli 5 * fix: add training data for Bilingual Embeddings fixes #2167 * 1.36.1 Automatically generated by python-semantic-release * Added training data annotation for e5-base-4k (#2186) * fix: Added training data annotations to MXBAI (#2185) * fix: Update MTEB(Scandinavian) to use new DanFEVER (#2180) This also resolves the missing data in the leaderboard. Fixes #2172 * fix: Added training data annotation for MMLW models (#2188) * Added training data annotation for MMLW models * Added GIST annotations Kenneth missed * Added Stella en 400m training data' * 1.36.2 Automatically generated by python-semantic-release * fix: Added training data for sentence-croissant (#2189) * 1.36.3 Automatically generated by python-semantic-release * fix: update ru models annotation (#2181) * 1.36.4 Automatically generated by python-semantic-release * fix: Alphabetical ordering of tasks in dropdowns (#2191) * 1.36.5 Automatically generated by python-semantic-release * misc: Speed up qrel creation in any2anyretrieval (#2196) * use numpy vectorized operations instead of row-by-row * scores are int * use 'mteb.MTEB' instead of 'MTEB' for custom model (#2199) * add base models for e5 (#2183) * add similar datasets (#2205) * add similar datasets * add nano * update is filled * Update mteb/abstasks/TaskMetadata.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * add labse annotation (#2182) * add labse annotation * Update mteb/models/sentence_transformers_models.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * fix: Fixed leaderboard crash (#2221) * Fixed leaderboard crash * Fixed language selection error * Ran linting * 1.36.6 Automatically generated by python-semantic-release * fix: More training data annotations (#2220) * Added training data annotation for bge-gemma * Added missing annotations for Voyage models * Added training data for sts-multilingual-mpnet * Added all mteb datasets to STS-multilingual training data * 1.36.7 Automatically generated by python-semantic-release * Add LLM2CLIP (OpenAI variants) (#2222) * model loading and get_text_embeddings * add image_emb, fused_emb, and calc probs methods * add b16 model * add llm2clip_openai_l_14_224 (not working yet) * got llm2clip_openai_l_14_224 working * make lint * add training sets and allow py files * Change `dataset on HF` test to use official api (#2213) * refactor dataset checking * increase timeout * increase timeout * remove timeout * Descriptive stats functions for Any2AnyMC and ImageTextPC (#2197) * Add Any2AnyMC descriptive stats * Add descriptive stats function for ImageTextPC * add descriptive stats examples * linter * update multi choice descriptive stats * Update tasks table * fix: Add training data annotations to uderver-bloom models (#2210) * fix: Add training data annotations to uderver-bloom models fixes #2193 * fix: add mixedbread --------- Co-authored-by: Márton Kardos <power.up1163@gmail.com> * 1.36.8 Automatically generated by python-semantic-release * Add comment to `voyage-3-m-exp` model (#2229) * remove model size from voyage-3-m-exp model * Update mteb/models/voyage_models.py * Update mteb/models/voyage_models.py * docs: Update description of EURLex (#2231) * Automatically add similar tasks to training_tasks (#2228) * refactor dataset checking * increase timeout * increase timeout * remove timeout * start * automatically find datasets * update comment * fix aggregate task metadata * fixes * lint * rename * update fetch check * Remove overlapping legends from radar chart (#2195) * Remove overlapping legends from radar chart * ensure graph is not blocked * Overlapping legend issue of Radar Chart * misc: Run Any2AnyRetrieval descriptive stats (#2223) * run a few datasets * add a few more * run more tasks * add more datasets * remove pdb * remove newline * add more datasets * Update tasks table * misc: Add rest of the vision centric and compositionality descriptive stats (#2267) add the rest * Update tasks table * Fix `calculate_memory_usage_mb` in adding_a_model.md (#2271) * Add Arabic-Triplet-Matryoshka-V2 model metadata to MTEB (#2270) * Add Arabic-Triplet-Matryoshka-V2 model metadata to MTEB * Update memory_usage_mb with correct calculated value * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * remove comments * added correct memory usage * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Apply linter fixes with ruff * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Add Arabic_Triplet_Matryoshka_V2 to overview.py * Rename model file to ara_models.py and update imports --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix: Add WebFAQ Retrieval dataset (#2236) * Add WebFAQ Retrieval dataset Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Small change WebFAQRetrieval.py Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Add remaining languages to WebFAQ Retrieval task Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Add descriptive stats Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> --------- Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Update tasks table * 1.36.9 Automatically generated by python-semantic-release * fix: Formatting issue in Performance Plot (#2237) * Formatting issue in Performance Plot * make lint * added function for better code readability * 1.36.10 Automatically generated by python-semantic-release * ci: run test_dataset_on_hf separately (#2201) * dont run test_dataset_on_hf in every pr * lint * Update call pytest test_datasets Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.co…


Solves #2241
====================================================
Code Quality
make lintto maintain consistent style.Documentation
Testing
make test-with-coverage.make testormake test-with-coverageto ensure no existing functionality is broken.Adding datasets checklist
Reason for dataset addition: ...
mteb -m {model_name} -t {task_name}command.sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2intfloat/multilingual-e5-smallself.stratified_subsampling() under dataset_transform()make test.make lint.Adding a model checklist
mteb.get_model(model_name, revision)andmteb.get_model_meta(model_name, revision)