Merged
Conversation
KennethEnevoldsen
added a commit
that referenced
this pull request
Feb 14, 2025
silky1708
pushed a commit
to silky1708/mteb
that referenced
this pull request
Mar 10, 2025
* fix: Added gte models * fix: Add mixbai models (embeddings-benchmark#1540) for embeddings-benchmark#1515
isaac-chung
added a commit
that referenced
this pull request
Mar 13, 2025
* misc: Add image classification descriptive stats implementation (#2045) * add ImageClassificationDescriptiveStatistics * add MNIST descriptive stats * use tuples instead * add label count and update docstrings * update MNIST example * Update tasks table * fix: Add column descriptions to leaderboard (#2039) * fix: Add column descriptions to leaderboard * removed existing overlap * fix: Add BRIGHT (long) and fix bug in TaskResult.filter_and_validate() (#2041) * fix: Add BRIGHT Long Fixes #1978 * fix: Add BRIGHT(long) * fix bug in task results * updated bright * updated tests for TaskResults * 1.34.12 Automatically generated by python-semantic-release * misc: Add image clustering descriptive stats implementation (#2057) * add image clustering descirptive stats and run * finish off last one * remove script * fix: Update embed_dim for jina models (#2058) see embeddings-benchmark/results#117 * Update tasks table * 1.34.13 Automatically generated by python-semantic-release * Add giga embeddings (#1741) * add gigaembeddings * use jasper * fix name * create sentence_transformer instruct wrapper * apply instruction template * fix jasper * update meta * misc: Add ZS and multilabel image classification descriptive stats implementation (#2059) * add image clustering descirptive stats and run * finish off last one * remove script * add ImageMultilabelClassificationDescriptiveStatistics * add VOC2007 * add zeroshot and mnist example * Update tasks table * Rename MIEB task classes with duplicated names (#2061) fix class names * misc: Add VisualSTS descriptive stats (#2062) * add visualsts stats * add last dataset * Update tasks table * fix: Added gte models (#1539) * fix: Added gte models * fix: Add mixbai models (#1540) for #1515 * fix: Add climate fever v2 (#1873) * Updated ClimateFEVER dataset with new version * Adds Fill in the empty metadata. * Updates the date tuple * Update class name Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update domains Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update task_subtypes * Update annotations_creators for the first version * Update date Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update task subtypes * Update path * Update description --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Mina Parham <minaparham@Keatext.local> * Update tasks table * fix: Updating paper scripts (#1958) * change reference revisions to align with paper * Update author list * Added code for main results table * updated minor changes * added external as a "no_revision_available" case * revert unintended changes * format * 1.34.14 Automatically generated by python-semantic-release * Add datasets for a benchmark newly introduced for "Engineering" domain (#1911) * adding clustering tasks (built-bench-clustering S2S & P2P) * updated built-bench-clustering tasks * Updated BuiltBenchClustering tasks * Added "Engineering" as new domain to TaskMetadata.py * Updated tasks table in docs * Updated task metadata for BuiltBenchClustering S2S and P2P * updated metadata for clustering tasks * Add/update BuiltBench tasks - Add BuiltBenchRetrieval task - Add BuiltBenchReranking task - Update metadata for BuiltBenchClusterinP2P - Update metadata for BuiltBenchClusterinS2S * update BuiltBench benchmark * Update mteb/benchmarks/benchmarks.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Clustering/eng/BuiltBenchClusteringS2S.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Clustering/eng/BuiltBenchClusteringP2P.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/benchmarks/benchmarks.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Fix formatting via ruff --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Update tasks table * misc: update model names to adjust for adding to results repo (#2074) * update model names to adjust for adding to results repo * update model meta script * misc: Add all image classification descriptive stats (#2073) * add most image classification descr stats * revert changes to encoder * add stats --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * Update tasks table * ci: Rerun tests that fail due to networking issues. (#2029) * fix: rerun tests that fail - Networking * update tests to use tmp_path * set versions for dev dependencies * add pytest options to pyproject.toml * add rerun json.decoder.JSONDecodeError * remove JSONDecodeError from pyproject.toml * add huggingface_hub.errors.HfHubHTTPError * add huggingface_hub.errors.LocalEntryNotFoundError https://github.com/embeddings-benchmark/mteb/actions/runs/13298535701/job/37139767443?pr=2044 * FileNotFoundError https://github.com/embeddings-benchmark/mteb/actions/runs/13302915091/job/37147507251?pr=2029 * add doc to pytest rerun --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> * fix: generate metadata (#2063) * fix: generate metadata * use logging not print for script * lint * add iso639 to dev pyproject * fix import * add memory_usage_mb * set version for iso639 Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * 1.34.15 Automatically generated by python-semantic-release * fix: add missing `e5` training datasets (#2065) add missing training datasets * 1.34.16 Automatically generated by python-semantic-release * fix: Ensure voyage model uses different naming scheme (#2083) * fix: Added make command for running leaderboard locally * fix: Ensure voyage models doesn't re-use the name * 1.34.17 Automatically generated by python-semantic-release * fix: Freeze model/rank columns in leaderboard (#2044) * fix: freeze model/rank columns in leaderboard * freezing zero-shot column * update min gradio version to 5.16.0 in pyproject.toml --------- Co-authored-by: Shikhar Shiromani <sshiromani@sshiromani-mlt.client.nvidia.com> * 1.34.18 Automatically generated by python-semantic-release * fix: Fixed previous incorrect specification of splits for CMTEB ( MTEB(cmn, v1) ) (#2086) Fixes #2064 * 1.34.19 Automatically generated by python-semantic-release * Remove duplicated string in docstring of TaskMetadata class (#2087) * Remove duplicated string in docstring of TaskMetadata class * Remove duplicated dataset field * fix: Smarter leaderboard caching with cachetools (#2085) * Added smarter caching to callbacks * Added cachetools as a dependency * Ran linting * Removed debugging print statement * Bumped Gradio version * Dependency fixes * Dependency fixes --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * fix: Missing fixes for #2086 - change MultilingualSentiment split from test to validation in CMTEB (#2088) * fix: Fixed previous incorrect specification of splits for CMTEB ( MTEB(cmn, v1) ) Fixes #2064 * change MultilingualSentiment split from test to validation in CMTEB * 1.34.20 Automatically generated by python-semantic-release * merge gme models (#2089) * fix: Add back task filtering by modalities (#2080) * add back task filtering by modalities * add unit test * check if task modalities is a subset of model modalities and fix tests * add model_modalities_more_than_task_modalities case * 1.34.21 Automatically generated by python-semantic-release * Added gtr-t5-base/large/xl/xxl metadata to mteb (#2092) * Added GTR Models to codebase * Linted gtr models file. * Added gtr-base/large/xl/xxl to sentence_transformers_models.py * Added memory_usage_mb and training_datasets * Reformatted training dataset names * Reformatted training dataset names * Reformatted training dataset names --------- Co-authored-by: sufen <sufenf@gmail.com> * misc: Add Any2TextMutipleChoice Descriptive Statistics (#2095) * add Any2TextMutipleChoiceDescriptiveStatistics * run on all tasks * Update tasks table * fix: Updated model annotations for GTE, e5, gritlm, and SFR models (#2101) Reported with references to paper + qoutes. * fix: Update links (#2098) * Fix link * Fix link * 1.34.22 Automatically generated by python-semantic-release * Add model inf-retriever-v1-1.5b (#2106) Add inf-retriever-v1-1.5b model * docs: Fix typos & refine text (#2102) * Update app.py * Fix typos * misc: Run Zeroshot Classification Descriptive Stats (#2105) * add most datasets * add birdsnap and imgnet1k * add scimmir and sun397 * add uck101 zs * Update tasks table * fix: add warning about task category conversion (#2108) add warning about task category conversion * 1.34.23 Automatically generated by python-semantic-release * fix: Add codesage-large-v2 (#2090) * Add codesage-large-v2 * Address comments * Add training dataset * Fix issues * Format code * Remove unnecessary wrapper * 1.34.24 Automatically generated by python-semantic-release * fix: add training data to BGE-m3-custom-fr (#2110) This ensure that is it correctly filtered as non-zero-shot * 1.34.25 Automatically generated by python-semantic-release * fix: Upgrade ruff to be gradio compatible (#2111) * fix: update ruff to be gradio compatible (>=0.9.3) * format * fix: upgrade ruff to latests (same as gradio compatible) * 1.34.26 Automatically generated by python-semantic-release * docs: Follow google docstring format (#2115) Fixes #2113 * Update leaderboard_refresh.yaml (#2121) * fix InstructSentenceTransformer Model name (#2125) fix params * fix voyage (#2127) * fix: update e5 instruct training data (#2129) update e5 training data * 1.34.27 Automatically generated by python-semantic-release * format * Update tasks table * fix: Add 2 new Static Sentence Transformer models (#2112) * Add 2 new Static Sentence Transformer models * Add Tatoeba Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * 1.34.28 Automatically generated by python-semantic-release * add is_cross_encoder (#1869) * add is_cross_encoder * Update mteb/model_meta.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * change value --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Qodo embed 1 1.5 b (#2137) * feat: Add Qodo-Embed-1-1.5B model metadata * fix: Add Qodo models to overview imports * fix: Add adapted_from field to Qodo model metadata * Update mteb/models/qodo_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * relint --------- Co-authored-by: Tal Sheffer <tal.s@codium.ai> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * misc: merge summary retrieval into bitext mining (#2140) merge summary retrieval into bitext mining * test: fix dataset availability test (#2141) This simplified the test and also make it a lot simpler. It also removed about 100 test cases which where all to the same API call. * fix: Update NVIDIA-Embed training data (#2143) Added a few missing annotations for nvidia-embed * 1.34.29 Automatically generated by python-semantic-release * fix: Add annotations for Voyage exp (#2144) * fix: Update NVIDIA-Embed training data Added a few missing annotations for nvidia-embed * fix update annotationf for voyage exp * 1.34.30 Automatically generated by python-semantic-release * Fix tokens num in cde models (#2148) fix tokens * feat: Add Qodo-Embed-1-7B model metadata and rename existing model (#2146) * feat: Add Qodo-Embed-1-7B model metadata and rename existing model * lint * fix revision * update license name --------- Co-authored-by: Tal Sheffer <tal.s@codium.ai> * 1.35.0 Automatically generated by python-semantic-release * misc: add Any2AnyRetrievalDescriptiveStatistics (#2139) add Any2AnyRetrievalDescriptiveStatistics * Update tasks table * Added zero-shot percentages and different filtering scheme (#2153) * Added zero-shot percentages and different filtering scheme * Update mteb/model_meta.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix: Incorrect annotations for Mistral-based embedding models (#2157) Fixes #2155 * 1.35.1 Automatically generated by python-semantic-release * Update FaMTEBRetrieval.py (#2171) The URL pointed to the settings page instead of the main repo URL. Now it is fixed. * Update tasks table * fix: Add Training data annotations (#2173) * redo to voyage to only training data * Add training data annotation for Kalm embeddings #2168 * Add correct training data annotations to Stella #2164 * removed fiqa PL as it does not exist * remove ArxivClusteringS2S.v2 as it does not exist * Add training data annotation for GIST embedding #2166 * fix max tokens for kalm models #2162 * remove eli 5 * 1.35.2 Automatically generated by python-semantic-release * feat: Add MIEB and MIEB-lite as benchmarks (#2035) * add mieb and mieb-lite to benchmarks * add CompositionalityEvaluation and DocumentUnderstanding types * add VisionCentric type * add missing comma * split STS17MultilingualVisualSTS and STSBenchmarkMultilingualSTS to eng and non-eng * use aggregate task instead so we can name the subsets * shorten names * fix import * alternative strategy to avoid using get_task * follow other aggregate tasks and skip metadata test * run LB without errors when selecting MIEB(-lite) * add back the capability as taks type * typo * extend description * split into mieb(eng) and mieb(multilingual) * remove unneeded files * remove aggtask additions for test * edit descriptions based on screenshots * shorten * rename to Compositionality and include ImageCoDeT2IMultiChoice * re-tag missing VisionCentric tasks * re-tag rparis and roxford as retrieval and include fixes * re-tag voc2007 as image cls * make lint * correct num task types in descriptions * add one model to models_to_annotate * add mieb reference models * update task types * relabel to multilingual retrieval task type to align with paper * fix reference and bibtex * edit task list to match with final list * add back agg task to reproduce table column in paper * fix filtering and import * update tests * mieb lite add back missing tasks * fix metadata test * multi should have all 4 variants * fix task counts * lite has 10 task types * fix visualSTS-17 lang splits * Aggregate task can now use subsets & eval langs to filter TaskResults * fix test and mark VisualSTS17 as multilingual * fix tests * add agg task running script * add voyage meta * fix citations * capitalize * add coarse/fine labels --------- Co-authored-by: gowitheflow-1998 <jsbs54@durham.ac.uk> * Update tasks table * 1.36.0 Automatically generated by python-semantic-release * fix: update training datasets and revision for jina models (#2179) * feat: update training datasets and revision for jina models * feat: update training datasets and revision for jina models * fix: Add more training data annotations (#2178) * redo to voyage to only training data * Add training data annotation for Kalm embeddings #2168 * Add correct training data annotations to Stella #2164 * removed fiqa PL as it does not exist * remove ArxivClusteringS2S.v2 as it does not exist * Add training data annotation for GIST embedding #2166 * fix max tokens for kalm models #2162 * remove eli 5 * fix: add training data for Bilingual Embeddings fixes #2167 * 1.36.1 Automatically generated by python-semantic-release * Added training data annotation for e5-base-4k (#2186) * fix: Added training data annotations to MXBAI (#2185) * fix: Update MTEB(Scandinavian) to use new DanFEVER (#2180) This also resolves the missing data in the leaderboard. Fixes #2172 * fix: Added training data annotation for MMLW models (#2188) * Added training data annotation for MMLW models * Added GIST annotations Kenneth missed * Added Stella en 400m training data' * 1.36.2 Automatically generated by python-semantic-release * fix: Added training data for sentence-croissant (#2189) * 1.36.3 Automatically generated by python-semantic-release * fix: update ru models annotation (#2181) * 1.36.4 Automatically generated by python-semantic-release * fix: Alphabetical ordering of tasks in dropdowns (#2191) * 1.36.5 Automatically generated by python-semantic-release * misc: Speed up qrel creation in any2anyretrieval (#2196) * use numpy vectorized operations instead of row-by-row * scores are int * use 'mteb.MTEB' instead of 'MTEB' for custom model (#2199) * add base models for e5 (#2183) * add similar datasets (#2205) * add similar datasets * add nano * update is filled * Update mteb/abstasks/TaskMetadata.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * add labse annotation (#2182) * add labse annotation * Update mteb/models/sentence_transformers_models.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * fix: Fixed leaderboard crash (#2221) * Fixed leaderboard crash * Fixed language selection error * Ran linting * 1.36.6 Automatically generated by python-semantic-release * fix: More training data annotations (#2220) * Added training data annotation for bge-gemma * Added missing annotations for Voyage models * Added training data for sts-multilingual-mpnet * Added all mteb datasets to STS-multilingual training data * 1.36.7 Automatically generated by python-semantic-release * Add LLM2CLIP (OpenAI variants) (#2222) * model loading and get_text_embeddings * add image_emb, fused_emb, and calc probs methods * add b16 model * add llm2clip_openai_l_14_224 (not working yet) * got llm2clip_openai_l_14_224 working * make lint * add training sets and allow py files * Change `dataset on HF` test to use official api (#2213) * refactor dataset checking * increase timeout * increase timeout * remove timeout * Descriptive stats functions for Any2AnyMC and ImageTextPC (#2197) * Add Any2AnyMC descriptive stats * Add descriptive stats function for ImageTextPC * add descriptive stats examples * linter * update multi choice descriptive stats * Update tasks table * fix: Add training data annotations to uderver-bloom models (#2210) * fix: Add training data annotations to uderver-bloom models fixes #2193 * fix: add mixedbread --------- Co-authored-by: Márton Kardos <power.up1163@gmail.com> * 1.36.8 Automatically generated by python-semantic-release * Add comment to `voyage-3-m-exp` model (#2229) * remove model size from voyage-3-m-exp model * Update mteb/models/voyage_models.py * Update mteb/models/voyage_models.py * docs: Update description of EURLex (#2231) * Automatically add similar tasks to training_tasks (#2228) * refactor dataset checking * increase timeout * increase timeout * remove timeout * start * automatically find datasets * update comment * fix aggregate task metadata * fixes * lint * rename * update fetch check * Remove overlapping legends from radar chart (#2195) * Remove overlapping legends from radar chart * ensure graph is not blocked * Overlapping legend issue of Radar Chart * misc: Run Any2AnyRetrieval descriptive stats (#2223) * run a few datasets * add a few more * run more tasks * add more datasets * remove pdb * remove newline * add more datasets * Update tasks table * misc: Add rest of the vision centric and compositionality descriptive stats (#2267) add the rest * Update tasks table * Fix `calculate_memory_usage_mb` in adding_a_model.md (#2271) * Add Arabic-Triplet-Matryoshka-V2 model metadata to MTEB (#2270) * Add Arabic-Triplet-Matryoshka-V2 model metadata to MTEB * Update memory_usage_mb with correct calculated value * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * remove comments * added correct memory usage * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Apply linter fixes with ruff * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Add Arabic_Triplet_Matryoshka_V2 to overview.py * Rename model file to ara_models.py and update imports --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix: Add WebFAQ Retrieval dataset (#2236) * Add WebFAQ Retrieval dataset Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Small change WebFAQRetrieval.py Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Add remaining languages to WebFAQ Retrieval task Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Add descriptive stats Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> --------- Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Update tasks table * 1.36.9 Automatically generated by python-semantic-release * fix: Formatting issue in Performance Plot (#2237) * Formatting issue in Performance Plot * make lint * added function for better code readability * 1.36.10 Automatically generated by python-semantic-release * ci: run test_dataset_on_hf separately (#2201) * dont run test_dataset_on_hf in every pr * lint * Update call pytest test_datasets Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update tests/test_tasks/test_all_abstasks.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * not datasets for test * run dataset loading test for push or pull_request * apply feedback --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * add gemini-embedding-exp-03-07 (#2279) * add gemini-embedding-exp-03-07 * remove space for lint * lint fix * update link (#2281) * fix: Run remaining MIEB desc stats (#2288) * run Vidore * GLDv2 * run the rest --------- Co-authored-by: Isaac Chung <isaac@hn496lf4f9.lan> * Update tasks table * 1.36.11 Automatically generated by python-semantic-release * fix: Added Filter Modality (#2262) * Added Filter Modality * resolve suggestions * make lint * make sure test pass * make lint * added exclusive_modality_filter and unit tests * Integrate tests on overview.py * Update tests/test_overview.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * added task related to image modality * Update mteb/abstasks/AbsTask.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update mteb/abstasks/AbsTask.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * update overview..py * make lint * update documentation --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * 1.36.12 Automatically generated by python-semantic-release * fix: Add `ModelMeta` license & custom validations (#2293) * license validation * move licenses * update imports --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * 1.36.13 Automatically generated by python-semantic-release * ci: Add pre-commit hook (#2194) * make dev life nicer - pre-commit hooks * add pre-commit to install * update precommit * update ruff pre-commit * lint * lint --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> * Update tasks table * fix: bug in voyage implementation (#2304) * fix: Fix bug in voyage implementation "passage" is not a valid input for the voyage API. Remapped to "document". * Update mteb/models/voyage_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * 1.36.14 Automatically generated by python-semantic-release * fix: Update voyage name to include Org. (#2322) * 1.36.15 Automatically generated by python-semantic-release * Added VDR Model (#2290) * Added VDR Model * change custom wrapper to SentenceTransformer Wrapper * remove kwargs and add TODO for Image Modality * Update mteb/models/vdr_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix: Resolve conflicting dependencies (#2323) These errors where discovered when trying to install the package using `uv`. We have a problem with salesforce-lavis, which is not compatible with the current set of dependencies. * 1.36.16 Automatically generated by python-semantic-release * fix: remove SyntaxWarnings in py312 (#2325) * fix: Resolve conflicting dependencies These errors where discovered when trying to install the package using `uv`. We have a problem with salesforce-lavis, which is not compatible with the current set of dependencies. * fix: Remove syntax warnings occuring in python 3.12 ``` Python 3.12.0 (main, Oct 2 2023, 20:56:14) [Clang 16.0.3 ] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import mteb # no syntax warnings >>> ``` * 1.36.17 Automatically generated by python-semantic-release * fix: add annotation models for stella zh (#2277) * fix: add annotation models for stella zh Additionally fixed a few annotation errors * format * Update mteb/models/stella_models.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * 1.36.18 Automatically generated by python-semantic-release * fix: Add ModelMeta rubert-mini-frida, BERTA (#2330) * Add rubert-mini-frida model meta * Add BERTA model meta * docs: fix typos * 1.36.19 Automatically generated by python-semantic-release * fix: Add WebFAQ bitext mining tasks (#2326) * Add WebFAQ bitext mining tasks Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Lower number of language pairs in WebFAQBitextMining Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> --------- Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Update tasks table * 1.36.20 Automatically generated by python-semantic-release * make lint * fix validation for license * fix remaining validation errors --------- Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: github-actions <github-actions@github.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Mina Parham <36207068+mina-parham@users.noreply.github.com> Co-authored-by: Mina Parham <minaparham@Keatext.local> Co-authored-by: Mehrzad Shahin-Moghadam <42153677+mehrzadshm@users.noreply.github.com> Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> Co-authored-by: Sam <40773225+sam-hey@users.noreply.github.com> Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> Co-authored-by: Shikhar Shiromani <rbk.shikhar@gmail.com> Co-authored-by: Shikhar Shiromani <sshiromani@sshiromani-mlt.client.nvidia.com> Co-authored-by: Ruslan Bel'kov <ruslan.belckov@yandex.ru> Co-authored-by: Márton Kardos <power.up1163@gmail.com> Co-authored-by: sufen-f <sufenfong@gmail.com> Co-authored-by: sufen <sufenf@gmail.com> Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> Co-authored-by: Samuel Yang <samuelyang150@gmail.com> Co-authored-by: Aradhye Agarwal <aradhyeagarwal@gmail.com> Co-authored-by: Tom Aarsen <37621491+tomaarsen@users.noreply.github.com> Co-authored-by: talshef <tsheffer@gmail.com> Co-authored-by: Tal Sheffer <tal.s@codium.ai> Co-authored-by: garciasces <garciasces@madrid.es> Co-authored-by: gowitheflow-1998 <jsbs54@durham.ac.uk> Co-authored-by: Wang Bo <bo.wang@jina.ai> Co-authored-by: Munot Ayush Sunil <munotayush6@kgpian.iitkgp.ac.in> Co-authored-by: Yaya Sy <58347382+yaya-sy@users.noreply.github.com> Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com> Co-authored-by: Eng. Omar Najar <79968243+omarnj-lab@users.noreply.github.com> Co-authored-by: Michael Dinzinger <39766249+michaeldinzinger@users.noreply.github.com> Co-authored-by: Jinhyuk Lee <lee.jnhk@gmail.com> Co-authored-by: Isaac Chung <isaac@hn496lf4f9.lan> Co-authored-by: sergeyz-zh <49659999+sergeyz-zh@users.noreply.github.com>
isaac-chung
added a commit
that referenced
this pull request
Apr 1, 2025
* misc: Add image classification descriptive stats implementation (#2045) * add ImageClassificationDescriptiveStatistics * add MNIST descriptive stats * use tuples instead * add label count and update docstrings * update MNIST example * Update tasks table * fix: Add column descriptions to leaderboard (#2039) * fix: Add column descriptions to leaderboard * removed existing overlap * fix: Add BRIGHT (long) and fix bug in TaskResult.filter_and_validate() (#2041) * fix: Add BRIGHT Long Fixes #1978 * fix: Add BRIGHT(long) * fix bug in task results * updated bright * updated tests for TaskResults * 1.34.12 Automatically generated by python-semantic-release * misc: Add image clustering descriptive stats implementation (#2057) * add image clustering descirptive stats and run * finish off last one * remove script * fix: Update embed_dim for jina models (#2058) see embeddings-benchmark/results#117 * Update tasks table * 1.34.13 Automatically generated by python-semantic-release * Add giga embeddings (#1741) * add gigaembeddings * use jasper * fix name * create sentence_transformer instruct wrapper * apply instruction template * fix jasper * update meta * misc: Add ZS and multilabel image classification descriptive stats implementation (#2059) * add image clustering descirptive stats and run * finish off last one * remove script * add ImageMultilabelClassificationDescriptiveStatistics * add VOC2007 * add zeroshot and mnist example * Update tasks table * Rename MIEB task classes with duplicated names (#2061) fix class names * misc: Add VisualSTS descriptive stats (#2062) * add visualsts stats * add last dataset * Update tasks table * fix: Added gte models (#1539) * fix: Added gte models * fix: Add mixbai models (#1540) for #1515 * fix: Add climate fever v2 (#1873) * Updated ClimateFEVER dataset with new version * Adds Fill in the empty metadata. * Updates the date tuple * Update class name Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update domains Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update task_subtypes * Update annotations_creators for the first version * Update date Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update task subtypes * Update path * Update description --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Mina Parham <minaparham@Keatext.local> * Update tasks table * fix: Updating paper scripts (#1958) * change reference revisions to align with paper * Update author list * Added code for main results table * updated minor changes * added external as a "no_revision_available" case * revert unintended changes * format * 1.34.14 Automatically generated by python-semantic-release * Add datasets for a benchmark newly introduced for "Engineering" domain (#1911) * adding clustering tasks (built-bench-clustering S2S & P2P) * updated built-bench-clustering tasks * Updated BuiltBenchClustering tasks * Added "Engineering" as new domain to TaskMetadata.py * Updated tasks table in docs * Updated task metadata for BuiltBenchClustering S2S and P2P * updated metadata for clustering tasks * Add/update BuiltBench tasks - Add BuiltBenchRetrieval task - Add BuiltBenchReranking task - Update metadata for BuiltBenchClusterinP2P - Update metadata for BuiltBenchClusterinS2S * update BuiltBench benchmark * Update mteb/benchmarks/benchmarks.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Clustering/eng/BuiltBenchClusteringS2S.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Clustering/eng/BuiltBenchClusteringP2P.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/benchmarks/benchmarks.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Fix formatting via ruff --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Update tasks table * misc: update model names to adjust for adding to results repo (#2074) * update model names to adjust for adding to results repo * update model meta script * misc: Add all image classification descriptive stats (#2073) * add most image classification descr stats * revert changes to encoder * add stats --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * Update tasks table * ci: Rerun tests that fail due to networking issues. (#2029) * fix: rerun tests that fail - Networking * update tests to use tmp_path * set versions for dev dependencies * add pytest options to pyproject.toml * add rerun json.decoder.JSONDecodeError * remove JSONDecodeError from pyproject.toml * add huggingface_hub.errors.HfHubHTTPError * add huggingface_hub.errors.LocalEntryNotFoundError https://github.com/embeddings-benchmark/mteb/actions/runs/13298535701/job/37139767443?pr=2044 * FileNotFoundError https://github.com/embeddings-benchmark/mteb/actions/runs/13302915091/job/37147507251?pr=2029 * add doc to pytest rerun --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> * fix: generate metadata (#2063) * fix: generate metadata * use logging not print for script * lint * add iso639 to dev pyproject * fix import * add memory_usage_mb * set version for iso639 Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * 1.34.15 Automatically generated by python-semantic-release * fix: add missing `e5` training datasets (#2065) add missing training datasets * 1.34.16 Automatically generated by python-semantic-release * fix: Ensure voyage model uses different naming scheme (#2083) * fix: Added make command for running leaderboard locally * fix: Ensure voyage models doesn't re-use the name * 1.34.17 Automatically generated by python-semantic-release * fix: Freeze model/rank columns in leaderboard (#2044) * fix: freeze model/rank columns in leaderboard * freezing zero-shot column * update min gradio version to 5.16.0 in pyproject.toml --------- Co-authored-by: Shikhar Shiromani <sshiromani@sshiromani-mlt.client.nvidia.com> * 1.34.18 Automatically generated by python-semantic-release * fix: Fixed previous incorrect specification of splits for CMTEB ( MTEB(cmn, v1) ) (#2086) Fixes #2064 * 1.34.19 Automatically generated by python-semantic-release * Remove duplicated string in docstring of TaskMetadata class (#2087) * Remove duplicated string in docstring of TaskMetadata class * Remove duplicated dataset field * fix: Smarter leaderboard caching with cachetools (#2085) * Added smarter caching to callbacks * Added cachetools as a dependency * Ran linting * Removed debugging print statement * Bumped Gradio version * Dependency fixes * Dependency fixes --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * fix: Missing fixes for #2086 - change MultilingualSentiment split from test to validation in CMTEB (#2088) * fix: Fixed previous incorrect specification of splits for CMTEB ( MTEB(cmn, v1) ) Fixes #2064 * change MultilingualSentiment split from test to validation in CMTEB * 1.34.20 Automatically generated by python-semantic-release * merge gme models (#2089) * fix: Add back task filtering by modalities (#2080) * add back task filtering by modalities * add unit test * check if task modalities is a subset of model modalities and fix tests * add model_modalities_more_than_task_modalities case * 1.34.21 Automatically generated by python-semantic-release * Added gtr-t5-base/large/xl/xxl metadata to mteb (#2092) * Added GTR Models to codebase * Linted gtr models file. * Added gtr-base/large/xl/xxl to sentence_transformers_models.py * Added memory_usage_mb and training_datasets * Reformatted training dataset names * Reformatted training dataset names * Reformatted training dataset names --------- Co-authored-by: sufen <sufenf@gmail.com> * misc: Add Any2TextMutipleChoice Descriptive Statistics (#2095) * add Any2TextMutipleChoiceDescriptiveStatistics * run on all tasks * Update tasks table * fix: Updated model annotations for GTE, e5, gritlm, and SFR models (#2101) Reported with references to paper + qoutes. * fix: Update links (#2098) * Fix link * Fix link * 1.34.22 Automatically generated by python-semantic-release * Add model inf-retriever-v1-1.5b (#2106) Add inf-retriever-v1-1.5b model * docs: Fix typos & refine text (#2102) * Update app.py * Fix typos * misc: Run Zeroshot Classification Descriptive Stats (#2105) * add most datasets * add birdsnap and imgnet1k * add scimmir and sun397 * add uck101 zs * Update tasks table * fix: add warning about task category conversion (#2108) add warning about task category conversion * 1.34.23 Automatically generated by python-semantic-release * fix: Add codesage-large-v2 (#2090) * Add codesage-large-v2 * Address comments * Add training dataset * Fix issues * Format code * Remove unnecessary wrapper * 1.34.24 Automatically generated by python-semantic-release * fix: add training data to BGE-m3-custom-fr (#2110) This ensure that is it correctly filtered as non-zero-shot * 1.34.25 Automatically generated by python-semantic-release * fix: Upgrade ruff to be gradio compatible (#2111) * fix: update ruff to be gradio compatible (>=0.9.3) * format * fix: upgrade ruff to latests (same as gradio compatible) * 1.34.26 Automatically generated by python-semantic-release * docs: Follow google docstring format (#2115) Fixes #2113 * Update leaderboard_refresh.yaml (#2121) * fix InstructSentenceTransformer Model name (#2125) fix params * fix voyage (#2127) * fix: update e5 instruct training data (#2129) update e5 training data * 1.34.27 Automatically generated by python-semantic-release * format * Update tasks table * fix: Add 2 new Static Sentence Transformer models (#2112) * Add 2 new Static Sentence Transformer models * Add Tatoeba Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * 1.34.28 Automatically generated by python-semantic-release * add is_cross_encoder (#1869) * add is_cross_encoder * Update mteb/model_meta.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * change value --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Qodo embed 1 1.5 b (#2137) * feat: Add Qodo-Embed-1-1.5B model metadata * fix: Add Qodo models to overview imports * fix: Add adapted_from field to Qodo model metadata * Update mteb/models/qodo_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * relint --------- Co-authored-by: Tal Sheffer <tal.s@codium.ai> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * misc: merge summary retrieval into bitext mining (#2140) merge summary retrieval into bitext mining * test: fix dataset availability test (#2141) This simplified the test and also make it a lot simpler. It also removed about 100 test cases which where all to the same API call. * fix: Update NVIDIA-Embed training data (#2143) Added a few missing annotations for nvidia-embed * 1.34.29 Automatically generated by python-semantic-release * fix: Add annotations for Voyage exp (#2144) * fix: Update NVIDIA-Embed training data Added a few missing annotations for nvidia-embed * fix update annotationf for voyage exp * 1.34.30 Automatically generated by python-semantic-release * Fix tokens num in cde models (#2148) fix tokens * feat: Add Qodo-Embed-1-7B model metadata and rename existing model (#2146) * feat: Add Qodo-Embed-1-7B model metadata and rename existing model * lint * fix revision * update license name --------- Co-authored-by: Tal Sheffer <tal.s@codium.ai> * 1.35.0 Automatically generated by python-semantic-release * misc: add Any2AnyRetrievalDescriptiveStatistics (#2139) add Any2AnyRetrievalDescriptiveStatistics * Update tasks table * Added zero-shot percentages and different filtering scheme (#2153) * Added zero-shot percentages and different filtering scheme * Update mteb/model_meta.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix: Incorrect annotations for Mistral-based embedding models (#2157) Fixes #2155 * 1.35.1 Automatically generated by python-semantic-release * Update FaMTEBRetrieval.py (#2171) The URL pointed to the settings page instead of the main repo URL. Now it is fixed. * Update tasks table * fix: Add Training data annotations (#2173) * redo to voyage to only training data * Add training data annotation for Kalm embeddings #2168 * Add correct training data annotations to Stella #2164 * removed fiqa PL as it does not exist * remove ArxivClusteringS2S.v2 as it does not exist * Add training data annotation for GIST embedding #2166 * fix max tokens for kalm models #2162 * remove eli 5 * 1.35.2 Automatically generated by python-semantic-release * feat: Add MIEB and MIEB-lite as benchmarks (#2035) * add mieb and mieb-lite to benchmarks * add CompositionalityEvaluation and DocumentUnderstanding types * add VisionCentric type * add missing comma * split STS17MultilingualVisualSTS and STSBenchmarkMultilingualSTS to eng and non-eng * use aggregate task instead so we can name the subsets * shorten names * fix import * alternative strategy to avoid using get_task * follow other aggregate tasks and skip metadata test * run LB without errors when selecting MIEB(-lite) * add back the capability as taks type * typo * extend description * split into mieb(eng) and mieb(multilingual) * remove unneeded files * remove aggtask additions for test * edit descriptions based on screenshots * shorten * rename to Compositionality and include ImageCoDeT2IMultiChoice * re-tag missing VisionCentric tasks * re-tag rparis and roxford as retrieval and include fixes * re-tag voc2007 as image cls * make lint * correct num task types in descriptions * add one model to models_to_annotate * add mieb reference models * update task types * relabel to multilingual retrieval task type to align with paper * fix reference and bibtex * edit task list to match with final list * add back agg task to reproduce table column in paper * fix filtering and import * update tests * mieb lite add back missing tasks * fix metadata test * multi should have all 4 variants * fix task counts * lite has 10 task types * fix visualSTS-17 lang splits * Aggregate task can now use subsets & eval langs to filter TaskResults * fix test and mark VisualSTS17 as multilingual * fix tests * add agg task running script * add voyage meta * fix citations * capitalize * add coarse/fine labels --------- Co-authored-by: gowitheflow-1998 <jsbs54@durham.ac.uk> * Update tasks table * 1.36.0 Automatically generated by python-semantic-release * fix: update training datasets and revision for jina models (#2179) * feat: update training datasets and revision for jina models * feat: update training datasets and revision for jina models * fix: Add more training data annotations (#2178) * redo to voyage to only training data * Add training data annotation for Kalm embeddings #2168 * Add correct training data annotations to Stella #2164 * removed fiqa PL as it does not exist * remove ArxivClusteringS2S.v2 as it does not exist * Add training data annotation for GIST embedding #2166 * fix max tokens for kalm models #2162 * remove eli 5 * fix: add training data for Bilingual Embeddings fixes #2167 * 1.36.1 Automatically generated by python-semantic-release * Added training data annotation for e5-base-4k (#2186) * fix: Added training data annotations to MXBAI (#2185) * fix: Update MTEB(Scandinavian) to use new DanFEVER (#2180) This also resolves the missing data in the leaderboard. Fixes #2172 * fix: Added training data annotation for MMLW models (#2188) * Added training data annotation for MMLW models * Added GIST annotations Kenneth missed * Added Stella en 400m training data' * 1.36.2 Automatically generated by python-semantic-release * fix: Added training data for sentence-croissant (#2189) * 1.36.3 Automatically generated by python-semantic-release * fix: update ru models annotation (#2181) * 1.36.4 Automatically generated by python-semantic-release * fix: Alphabetical ordering of tasks in dropdowns (#2191) * 1.36.5 Automatically generated by python-semantic-release * misc: Speed up qrel creation in any2anyretrieval (#2196) * use numpy vectorized operations instead of row-by-row * scores are int * use 'mteb.MTEB' instead of 'MTEB' for custom model (#2199) * add base models for e5 (#2183) * add similar datasets (#2205) * add similar datasets * add nano * update is filled * Update mteb/abstasks/TaskMetadata.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * add labse annotation (#2182) * add labse annotation * Update mteb/models/sentence_transformers_models.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * fix: Fixed leaderboard crash (#2221) * Fixed leaderboard crash * Fixed language selection error * Ran linting * 1.36.6 Automatically generated by python-semantic-release * fix: More training data annotations (#2220) * Added training data annotation for bge-gemma * Added missing annotations for Voyage models * Added training data for sts-multilingual-mpnet * Added all mteb datasets to STS-multilingual training data * 1.36.7 Automatically generated by python-semantic-release * Add LLM2CLIP (OpenAI variants) (#2222) * model loading and get_text_embeddings * add image_emb, fused_emb, and calc probs methods * add b16 model * add llm2clip_openai_l_14_224 (not working yet) * got llm2clip_openai_l_14_224 working * make lint * add training sets and allow py files * Change `dataset on HF` test to use official api (#2213) * refactor dataset checking * increase timeout * increase timeout * remove timeout * Descriptive stats functions for Any2AnyMC and ImageTextPC (#2197) * Add Any2AnyMC descriptive stats * Add descriptive stats function for ImageTextPC * add descriptive stats examples * linter * update multi choice descriptive stats * Update tasks table * fix: Add training data annotations to uderver-bloom models (#2210) * fix: Add training data annotations to uderver-bloom models fixes #2193 * fix: add mixedbread --------- Co-authored-by: Márton Kardos <power.up1163@gmail.com> * 1.36.8 Automatically generated by python-semantic-release * Add comment to `voyage-3-m-exp` model (#2229) * remove model size from voyage-3-m-exp model * Update mteb/models/voyage_models.py * Update mteb/models/voyage_models.py * docs: Update description of EURLex (#2231) * Automatically add similar tasks to training_tasks (#2228) * refactor dataset checking * increase timeout * increase timeout * remove timeout * start * automatically find datasets * update comment * fix aggregate task metadata * fixes * lint * rename * update fetch check * Remove overlapping legends from radar chart (#2195) * Remove overlapping legends from radar chart * ensure graph is not blocked * Overlapping legend issue of Radar Chart * misc: Run Any2AnyRetrieval descriptive stats (#2223) * run a few datasets * add a few more * run more tasks * add more datasets * remove pdb * remove newline * add more datasets * Update tasks table * misc: Add rest of the vision centric and compositionality descriptive stats (#2267) add the rest * Update tasks table * Fix `calculate_memory_usage_mb` in adding_a_model.md (#2271) * Add Arabic-Triplet-Matryoshka-V2 model metadata to MTEB (#2270) * Add Arabic-Triplet-Matryoshka-V2 model metadata to MTEB * Update memory_usage_mb with correct calculated value * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * remove comments * added correct memory usage * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Apply linter fixes with ruff * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Add Arabic_Triplet_Matryoshka_V2 to overview.py * Rename model file to ara_models.py and update imports --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix: Add WebFAQ Retrieval dataset (#2236) * Add WebFAQ Retrieval dataset Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Small change WebFAQRetrieval.py Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Add remaining languages to WebFAQ Retrieval task Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Add descriptive stats Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> --------- Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Update tasks table * 1.36.9 Automatically generated by python-semantic-release * fix: Formatting issue in Performance Plot (#2237) * Formatting issue in Performance Plot * make lint * added function for better code readability * 1.36.10 Automatically generated by python-semantic-release * ci: run test_dataset_on_hf separately (#2201) * dont run test_dataset_on_hf in every pr * lint * Update call pytest test_datasets Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update tests/test_tasks/test_all_abstasks.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * not datasets for test * run dataset loading test for push or pull_request * apply feedback --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * add gemini-embedding-exp-03-07 (#2279) * add gemini-embedding-exp-03-07 * remove space for lint * lint fix * update link (#2281) * fix: Run remaining MIEB desc stats (#2288) * run Vidore * GLDv2 * run the rest --------- Co-authored-by: Isaac Chung <isaac@hn496lf4f9.lan> * Update tasks table * 1.36.11 Automatically generated by python-semantic-release * fix: Added Filter Modality (#2262) * Added Filter Modality * resolve suggestions * make lint * make sure test pass * make lint * added exclusive_modality_filter and unit tests * Integrate tests on overview.py * Update tests/test_overview.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * added task related to image modality * Update mteb/abstasks/AbsTask.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update mteb/abstasks/AbsTask.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * update overview..py * make lint * update documentation --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * 1.36.12 Automatically generated by python-semantic-release * fix: Add `ModelMeta` license & custom validations (#2293) * license validation * move licenses * update imports --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * 1.36.13 Automatically generated by python-semantic-release * ci: Add pre-commit hook (#2194) * make dev life nicer - pre-commit hooks * add pre-commit to install * update precommit * update ruff pre-commit * lint * lint --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> * Update tasks table * fix: bug in voyage implementation (#2304) * fix: Fix bug in voyage implementation "passage" is not a valid input for the voyage API. Remapped to "document". * Update mteb/models/voyage_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * 1.36.14 Automatically generated by python-semantic-release * fix: Update voyage name to include Org. (#2322) * 1.36.15 Automatically generated by python-semantic-release * Added VDR Model (#2290) * Added VDR Model * change custom wrapper to SentenceTransformer Wrapper * remove kwargs and add TODO for Image Modality * Update mteb/models/vdr_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix: Resolve conflicting dependencies (#2323) These errors where discovered when trying to install the package using `uv`. We have a problem with salesforce-lavis, which is not compatible with the current set of dependencies. * 1.36.16 Automatically generated by python-semantic-release * fix: remove SyntaxWarnings in py312 (#2325) * fix: Resolve conflicting dependencies These errors where discovered when trying to install the package using `uv`. We have a problem with salesforce-lavis, which is not compatible with the current set of dependencies. * fix: Remove syntax warnings occuring in python 3.12 ``` Python 3.12.0 (main, Oct 2 2023, 20:56:14) [Clang 16.0.3 ] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import mteb # no syntax warnings >>> ``` * 1.36.17 Automatically generated by python-semantic-release * fix: add annotation models for stella zh (#2277) * fix: add annotation models for stella zh Additionally fixed a few annotation errors * format * Update mteb/models/stella_models.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * 1.36.18 Automatically generated by python-semantic-release * fix: Add ModelMeta rubert-mini-frida, BERTA (#2330) * Add rubert-mini-frida model meta * Add BERTA model meta * docs: fix typos * 1.36.19 Automatically generated by python-semantic-release * fix: Add WebFAQ bitext mining tasks (#2326) * Add WebFAQ bitext mining tasks Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Lower number of language pairs in WebFAQBitextMining Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> --------- Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Update tasks table * 1.36.20 Automatically generated by python-semantic-release * fix: Add `trust_remote_code` to MIRACLRetrieval * fix: Add `trust_remote_code` to MIRACLRetrieval (#2344) * 1.36.21 Automatically generated by python-semantic-release * fix: Correctly pass trust remote code to Miracl * fix: Ensure MIRACL pass trust_remote_code (#2346) * fix: Add `trust_remote_code` to MIRACLRetrieval * fix: Correctly pass trust remote code to Miracl * fix * 1.36.22 Automatically generated by python-semantic-release * add-Data Korean Clustering dataset (KLUE-modified) (#2283) * add PatentFnBClustering.py * do make lint and revise * rollback Makefile * Update mteb/tasks/Clustering/kor/PatentFnBClustering.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * klue_mrc_domain * make lint * klue_modified_clustering_dataset --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Rename dunzhang and Jasper models to NovaResearch (#2373) * Rename dunzhang and Jasper models to NovaResearch * rename model in tests * correct reference link * correct MIEB dataset stats (#2374) * correct stats * update Any2AnyMultiChoice qrels stats compute logic * final correction * Update tasks table * Correct -1 to No information in Zero shot (#2381) * fix leaderboard (#2385) * fix: Reduce logging and Warnings (#2349) * Reduce logging and Warnings * make lint * format license to lowercase * Address all comments * Update mteb/leaderboard/app.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * 1.36.23 Automatically generated by python-semantic-release * fix: b1ade (#2386) * fix: added b1ade_models.py (#2340) * added b1ade_models.py * changing based on requested * Update mteb/models/b1ade_models.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * fix: missing import and formatting --------- Co-authored-by: Shreyas Subramanian <shreyas.f117@gmail.com> * 1.36.24 Automatically generated by python-semantic-release * fix: pin gradio dependency to ensure leaderboards works (#2387) * 1.36.25 Automatically generated by python-semantic-release * fix: Ensure BrightRetrieval is valid to run (#2334) * fix: Ensure BrightRetrieval is valid to run Not sure this is the best way to fix this. Let me know if you can find a better fix. fixes #2327 * fix: convert brightretrieval to two tasks * fix collecting error * Update tasks table * 1.36.26 Automatically generated by python-semantic-release * Pass task name to all evaluators (#2389) * pass task name to all tasks * add test * fix loader * fix: renaming Zeroshot -> ZeroShot (#2395) * fix: renaming Zeroshot -> ZeroShot Adresses #2078 * rename 1 * rename 2 * format * fixed error * 1.36.27 Automatically generated by python-semantic-release * fix: Update AmazonPolarityClassification license (#2402) Update AmazonPolarityClassification.py * fix b1ade name (#2403) * 1.36.28 Automatically generated by python-semantic-release * Minor style changes (#2396) * fix: renaming Zeroshot -> ZeroShot Adresses #2078 * fix: minor style changes Adresses #2078 * rename 1 * rename 2 * format * fixed error --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Added new dataset and tasks - ClusTREC-covid , clustering of thematic covid related scientific papers (#2302) * Clustrec covid new dataset and task * fix * fix * fix * fix * fix * descriptive stats * change all mentions of clustrec-covidp2p to clustrec-covid * change ' to " * Update tasks table * fix: Major updates to docs + make mieb dep optional (#2397) * fix: renaming Zeroshot -> ZeroShot Adresses #2078 * fix: minor style changes Adresses #2078 * fix: Major updates to documentation This PR does the following: - This introduced other modalities more clearly in the documentation as well as make it easier to transition to a full on documentation site later. - added minor code updates due to discovered inconsistencies in docs and code. - Added the MMTEB citation where applicable - makes the docs ready to move torchvision to an optional dependency * Moved VISTA example * rename 1 * rename 2 * format * fixed error * fix: make torchvision optional (#2399) * fix: make torchvision optional * format * add docs * minor fix * remove transform from Any2TextMultipleChoiceEvaluator --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * move Running SentenceTransformer model with prompts to usage --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * 1.36.29 Automatically generated by python-semantic-release * remove Arabic_Triplet_Matryoshka_V2.py (#2405) * Min torchvision>0.2.1 (#2410) matching torch>1.0.0 * fix: Add validation to model_name in `ModelMeta` (#2404) * add test for name validation * upd docs * upd cohere name * fix tests * fix name for average_word_embeddings_komninos * fix name for average_word_embeddings_komninos * fix reranker test * fix reranker test * 1.36.30 Automatically generated by python-semantic-release * [MIEB] "capability measured"-Abstask 1-1 matching refactor [1/3]: reimplement CV-Bench (#2414) * refactor CV-Bench * reimplement CV Bench * remove abstask/evaluator/tests for Any2TextMultipleChoice * rerun descriptive stats * Update tasks table * fix: Add option to remove benchmark from leaderboard (#2417) fix: Add option to remove leaderboard from leaderboard fixes #2413 This only removed the benchmark from the leaderboard but keep it in MTEB. * 1.36.31 Automatically generated by python-semantic-release * fix: Add VDR Multilingual Dataset (#2408) * Added VDR Multilingual Dataset * address comments * make lint * Formated Dataset for retrieval * Update mteb/tasks/Retrieval/multilingual/VdrMultilingualRetrieval.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Retrieval/multilingual/VdrMultilingualRetrieval.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * make lint * corrected date * fix dataset building * move to image folder --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Update tasks table * 1.36.32 Automatically generated by python-semantic-release * HOTFIX: pin setuptools (#2423) * pin setuptools * pin setuptools * pin setuptools in makefile * try ci * fix ci * remove speed from installs * add __init__.py Clustering > kor folder, And edit __init__.py in Clustering folder (#2422) * add PatentFnBClustering.py * do make lint and revise * rollback Makefile * Update mteb/tasks/Clustering/kor/PatentFnBClustering.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * klue_mrc_domain * make lint * klue_modified_clustering_dataset * clustering & kor folder add __init.py * clustering & kor folder add __init__.py * task.py roll-back * correct text_creation to sample_creation & delete form in MetaData * correct task_subtype in TaskMetaData * delete space * edit metadata * edit task_subtypes --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update tasks table * Update speed dependencies with new setuptools release (#2429) * add richinfoai models (#2427) * add richinfoai models add richinfoai models * format codes by linter format codes by linter * Added Memory Usage column on leaderboard (#2428) * docs: typos; Standardize spacing; Chronological order (#2436) * Fix typos; add chrono order * Fix spacing * fix: Add model specific dependencies in pyproject.toml (#2424) * Add model specific dependencies in pyproject.toml * Update documentation * 1.36.33 Automatically generated by python-semantic-release * [MIEB] "capability measured"-Abstask 1-1 matching refactor [2/3]: reimplement r-Oxford and r-Paris (#2442) * MutipleChoiceEvaluationMixin; reimplement r-Oxford and r-Paris; rerun stats * modify benchmark list * fix citation * Update tasks table * Error while evaluating MIRACLRetrievalHardNegatives: 'trust_remote_code' (#2445) Fixes #2444 * Feat/searchmap preview (#2420) * Added meta information about SearchMap_Preview model to the model_dir * Added meta information about SearchMap_Preview model to the model_dir * updated revision name * Device loading and cuda cache cleaning step left out * removed task instructions since it's not necessary * changed sentence transformer loader to mteb default loader and passed instructions s model prompts * Included searchmap to the models overview page * Included searchmap to the models overview page * added meta data information about where model was adpated from * Update mteb/models/searchmap_models.py * fix lint * lint --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * Add Background Gradients in Summary and Task Table (#2392) * Add Background Gradients in Summary and Task Table * Remove warnings and add light green cmap * Address comments * Separate styling function * address comments * added comments * add ops_moa_models (#2439) * add ops_moa_models * add custom implementations * Simplify custom implementation and format the code * support SentenceTransformers * add training datasets * Update mteb/models/ops_moa_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * update training_datasets --------- Co-authored-by: kunka.xgw <kunka.xgw@taobao.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * leaderboard fix (#2456) * ci: cache `~/.cache/huggingface` (#2464) ci: cache ~/.cache/huggingface Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> * [MIEB] "capability measured"-Abstask 1-1 matching refactor [3/3]: reimplement ImageCoDe (#2468) * reimplement ImageCoDe with ImageTextPairClassification * add missing stats file * Update tasks table * fix: Adds family of NeuML/pubmedbert-base-embedding models (#2443) * feat: added pubmedbert model2vec models * fix: attribute model_name * fix: fixed commit hash for pubmed_bert model2vec models * fix: changes requested in PR 2443 * fix: add nb_sbert model (#2339) * add_nb_sbert_model * Update nb_sbert.py added n_parameters and release_date * Update mteb/models/nb_sbert.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update nb_sbert.py fix: make lint * added nb_sbert to overview.py + ran make lint * Update nb_sbert.py Fix error: Input should be a valid date or datetime, month value is outside expected range of 1-12 --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * 1.36.34 Automatically generated by python-semantic-release * fix test --------- Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: github-actions <github-actions@github.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Mina Parham <36207068+mina-parham@users.noreply.github.com> Co-authored-by: Mina Parham <minaparham@Keatext.local> Co-authored-by: Mehrzad Shahin-Moghadam <42153677+mehrzadshm@users.noreply.github.com> Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> Co-authored-by: Sam <40773225+sam-hey@users.noreply.github.com> Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> Co-authored-by: Shikhar Shiromani <rbk.shikhar@gmail.com> Co-authored-by: Shikhar Shiromani <sshiromani@sshiromani-mlt.client.nvidia.com> Co-authored-by: Ruslan Bel'kov <ruslan.belckov@yandex.ru> Co-authored-by: Márton Kardos <power.up1163@gmail.com> Co-authored-by: sufen-f <sufenfong@gmail.com> Co-authored-by: sufen <sufenf@gmail.com> Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> Co-authored-by: Samuel Yang <samuelyang150@gmail.com> Co-authored-by: Aradhye Agarwal <aradhyeagarwal@gmail.com> Co-authored-by: Tom Aarsen <37621491+tomaarsen@users.noreply.github.com> Co-authored-by: talshef <tsheffer@gmail.com> Co-authored-by: Tal Sheffer <tal.s@codium.ai> Co-authored-by: garciasces <garciasces@madrid.es> Co-authored-by: gowitheflow-1998 <jsbs54@durham.ac.uk> Co-authored-by: Wang Bo <bo.wang@jina.ai> Co-authored-by: Munot Ayush Sunil <munotayush6@kgpian.iitkgp.ac.in> Co-authored-by: Yaya Sy <58347382+yaya-sy@users.noreply.github.com> Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com> Co-authored-by: Eng. Omar Najar <79968243+omarnj-lab@users.noreply.github.com> Co-authored-by: Michael Dinzinger <39766249+michaeldinzinger@users.noreply.github.com> Co-authored-by: Jinhyuk Lee <lee.jnhk@gmail.com> Co-authored-by: Isaac Chung <isaac@hn496lf4f9.lan> Co-authored-by: sergeyz-zh <49659999+sergeyz-zh@users.noreply.github.com> Co-authored-by: OnandOn <76710635+OnAnd0n@users.noreply.github.com> Co-authored-by: chenghao xiao <85804993+gowitheflow-1998@users.noreply.github.com> Co-authored-by: Shreyas Subramanian <shreyas.f117@gmail.com> Co-authored-by: Uri K <37979288+katzurik@users.noreply.github.com> Co-authored-by: richinfo-ai <richinfoai@163.com> Co-authored-by: Adewole Babatunde <40810247+Free-tek@users.noreply.github.com> Co-authored-by: ahxgw <ahxgwOnePiece@gmail.com> Co-authored-by: kunka.xgw <kunka.xgw@taobao.com> Co-authored-by: Nadia Sheikh <144166074+nadshe@users.noreply.github.com> Co-authored-by: theatollersrud <thea.tollersrud@nb.no>
isaac-chung
added a commit
that referenced
this pull request
Apr 4, 2025
* misc: Add image classification descriptive stats implementation (#2045) * add ImageClassificationDescriptiveStatistics * add MNIST descriptive stats * use tuples instead * add label count and update docstrings * update MNIST example * Update tasks table * fix: Add column descriptions to leaderboard (#2039) * fix: Add column descriptions to leaderboard * removed existing overlap * fix: Add BRIGHT (long) and fix bug in TaskResult.filter_and_validate() (#2041) * fix: Add BRIGHT Long Fixes #1978 * fix: Add BRIGHT(long) * fix bug in task results * updated bright * updated tests for TaskResults * 1.34.12 Automatically generated by python-semantic-release * misc: Add image clustering descriptive stats implementation (#2057) * add image clustering descirptive stats and run * finish off last one * remove script * fix: Update embed_dim for jina models (#2058) see embeddings-benchmark/results#117 * Update tasks table * 1.34.13 Automatically generated by python-semantic-release * Add giga embeddings (#1741) * add gigaembeddings * use jasper * fix name * create sentence_transformer instruct wrapper * apply instruction template * fix jasper * update meta * misc: Add ZS and multilabel image classification descriptive stats implementation (#2059) * add image clustering descirptive stats and run * finish off last one * remove script * add ImageMultilabelClassificationDescriptiveStatistics * add VOC2007 * add zeroshot and mnist example * Update tasks table * Rename MIEB task classes with duplicated names (#2061) fix class names * misc: Add VisualSTS descriptive stats (#2062) * add visualsts stats * add last dataset * Update tasks table * fix: Added gte models (#1539) * fix: Added gte models * fix: Add mixbai models (#1540) for #1515 * fix: Add climate fever v2 (#1873) * Updated ClimateFEVER dataset with new version * Adds Fill in the empty metadata. * Updates the date tuple * Update class name Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update domains Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update task_subtypes * Update annotations_creators for the first version * Update date Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update task subtypes * Update path * Update description --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Mina Parham <minaparham@Keatext.local> * Update tasks table * fix: Updating paper scripts (#1958) * change reference revisions to align with paper * Update author list * Added code for main results table * updated minor changes * added external as a "no_revision_available" case * revert unintended changes * format * 1.34.14 Automatically generated by python-semantic-release * Add datasets for a benchmark newly introduced for "Engineering" domain (#1911) * adding clustering tasks (built-bench-clustering S2S & P2P) * updated built-bench-clustering tasks * Updated BuiltBenchClustering tasks * Added "Engineering" as new domain to TaskMetadata.py * Updated tasks table in docs * Updated task metadata for BuiltBenchClustering S2S and P2P * updated metadata for clustering tasks * Add/update BuiltBench tasks - Add BuiltBenchRetrieval task - Add BuiltBenchReranking task - Update metadata for BuiltBenchClusterinP2P - Update metadata for BuiltBenchClusterinS2S * update BuiltBench benchmark * Update mteb/benchmarks/benchmarks.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Clustering/eng/BuiltBenchClusteringS2S.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Clustering/eng/BuiltBenchClusteringP2P.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/benchmarks/benchmarks.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Fix formatting via ruff --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Update tasks table * misc: update model names to adjust for adding to results repo (#2074) * update model names to adjust for adding to results repo * update model meta script * misc: Add all image classification descriptive stats (#2073) * add most image classification descr stats * revert changes to encoder * add stats --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * Update tasks table * ci: Rerun tests that fail due to networking issues. (#2029) * fix: rerun tests that fail - Networking * update tests to use tmp_path * set versions for dev dependencies * add pytest options to pyproject.toml * add rerun json.decoder.JSONDecodeError * remove JSONDecodeError from pyproject.toml * add huggingface_hub.errors.HfHubHTTPError * add huggingface_hub.errors.LocalEntryNotFoundError https://github.com/embeddings-benchmark/mteb/actions/runs/13298535701/job/37139767443?pr=2044 * FileNotFoundError https://github.com/embeddings-benchmark/mteb/actions/runs/13302915091/job/37147507251?pr=2029 * add doc to pytest rerun --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> * fix: generate metadata (#2063) * fix: generate metadata * use logging not print for script * lint * add iso639 to dev pyproject * fix import * add memory_usage_mb * set version for iso639 Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * 1.34.15 Automatically generated by python-semantic-release * fix: add missing `e5` training datasets (#2065) add missing training datasets * 1.34.16 Automatically generated by python-semantic-release * fix: Ensure voyage model uses different naming scheme (#2083) * fix: Added make command for running leaderboard locally * fix: Ensure voyage models doesn't re-use the name * 1.34.17 Automatically generated by python-semantic-release * fix: Freeze model/rank columns in leaderboard (#2044) * fix: freeze model/rank columns in leaderboard * freezing zero-shot column * update min gradio version to 5.16.0 in pyproject.toml --------- Co-authored-by: Shikhar Shiromani <sshiromani@sshiromani-mlt.client.nvidia.com> * 1.34.18 Automatically generated by python-semantic-release * fix: Fixed previous incorrect specification of splits for CMTEB ( MTEB(cmn, v1) ) (#2086) Fixes #2064 * 1.34.19 Automatically generated by python-semantic-release * Remove duplicated string in docstring of TaskMetadata class (#2087) * Remove duplicated string in docstring of TaskMetadata class * Remove duplicated dataset field * fix: Smarter leaderboard caching with cachetools (#2085) * Added smarter caching to callbacks * Added cachetools as a dependency * Ran linting * Removed debugging print statement * Bumped Gradio version * Dependency fixes * Dependency fixes --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * fix: Missing fixes for #2086 - change MultilingualSentiment split from test to validation in CMTEB (#2088) * fix: Fixed previous incorrect specification of splits for CMTEB ( MTEB(cmn, v1) ) Fixes #2064 * change MultilingualSentiment split from test to validation in CMTEB * 1.34.20 Automatically generated by python-semantic-release * merge gme models (#2089) * fix: Add back task filtering by modalities (#2080) * add back task filtering by modalities * add unit test * check if task modalities is a subset of model modalities and fix tests * add model_modalities_more_than_task_modalities case * 1.34.21 Automatically generated by python-semantic-release * Added gtr-t5-base/large/xl/xxl metadata to mteb (#2092) * Added GTR Models to codebase * Linted gtr models file. * Added gtr-base/large/xl/xxl to sentence_transformers_models.py * Added memory_usage_mb and training_datasets * Reformatted training dataset names * Reformatted training dataset names * Reformatted training dataset names --------- Co-authored-by: sufen <sufenf@gmail.com> * misc: Add Any2TextMutipleChoice Descriptive Statistics (#2095) * add Any2TextMutipleChoiceDescriptiveStatistics * run on all tasks * Update tasks table * fix: Updated model annotations for GTE, e5, gritlm, and SFR models (#2101) Reported with references to paper + qoutes. * fix: Update links (#2098) * Fix link * Fix link * 1.34.22 Automatically generated by python-semantic-release * Add model inf-retriever-v1-1.5b (#2106) Add inf-retriever-v1-1.5b model * docs: Fix typos & refine text (#2102) * Update app.py * Fix typos * misc: Run Zeroshot Classification Descriptive Stats (#2105) * add most datasets * add birdsnap and imgnet1k * add scimmir and sun397 * add uck101 zs * Update tasks table * fix: add warning about task category conversion (#2108) add warning about task category conversion * 1.34.23 Automatically generated by python-semantic-release * fix: Add codesage-large-v2 (#2090) * Add codesage-large-v2 * Address comments * Add training dataset * Fix issues * Format code * Remove unnecessary wrapper * 1.34.24 Automatically generated by python-semantic-release * fix: add training data to BGE-m3-custom-fr (#2110) This ensure that is it correctly filtered as non-zero-shot * 1.34.25 Automatically generated by python-semantic-release * fix: Upgrade ruff to be gradio compatible (#2111) * fix: update ruff to be gradio compatible (>=0.9.3) * format * fix: upgrade ruff to latests (same as gradio compatible) * 1.34.26 Automatically generated by python-semantic-release * docs: Follow google docstring format (#2115) Fixes #2113 * Update leaderboard_refresh.yaml (#2121) * fix InstructSentenceTransformer Model name (#2125) fix params * fix voyage (#2127) * fix: update e5 instruct training data (#2129) update e5 training data * 1.34.27 Automatically generated by python-semantic-release * format * Update tasks table * fix: Add 2 new Static Sentence Transformer models (#2112) * Add 2 new Static Sentence Transformer models * Add Tatoeba Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * 1.34.28 Automatically generated by python-semantic-release * add is_cross_encoder (#1869) * add is_cross_encoder * Update mteb/model_meta.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * change value --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Qodo embed 1 1.5 b (#2137) * feat: Add Qodo-Embed-1-1.5B model metadata * fix: Add Qodo models to overview imports * fix: Add adapted_from field to Qodo model metadata * Update mteb/models/qodo_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * relint --------- Co-authored-by: Tal Sheffer <tal.s@codium.ai> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * misc: merge summary retrieval into bitext mining (#2140) merge summary retrieval into bitext mining * test: fix dataset availability test (#2141) This simplified the test and also make it a lot simpler. It also removed about 100 test cases which where all to the same API call. * fix: Update NVIDIA-Embed training data (#2143) Added a few missing annotations for nvidia-embed * 1.34.29 Automatically generated by python-semantic-release * fix: Add annotations for Voyage exp (#2144) * fix: Update NVIDIA-Embed training data Added a few missing annotations for nvidia-embed * fix update annotationf for voyage exp * 1.34.30 Automatically generated by python-semantic-release * Fix tokens num in cde models (#2148) fix tokens * feat: Add Qodo-Embed-1-7B model metadata and rename existing model (#2146) * feat: Add Qodo-Embed-1-7B model metadata and rename existing model * lint * fix revision * update license name --------- Co-authored-by: Tal Sheffer <tal.s@codium.ai> * 1.35.0 Automatically generated by python-semantic-release * misc: add Any2AnyRetrievalDescriptiveStatistics (#2139) add Any2AnyRetrievalDescriptiveStatistics * Update tasks table * Added zero-shot percentages and different filtering scheme (#2153) * Added zero-shot percentages and different filtering scheme * Update mteb/model_meta.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix: Incorrect annotations for Mistral-based embedding models (#2157) Fixes #2155 * 1.35.1 Automatically generated by python-semantic-release * Update FaMTEBRetrieval.py (#2171) The URL pointed to the settings page instead of the main repo URL. Now it is fixed. * Update tasks table * fix: Add Training data annotations (#2173) * redo to voyage to only training data * Add training data annotation for Kalm embeddings #2168 * Add correct training data annotations to Stella #2164 * removed fiqa PL as it does not exist * remove ArxivClusteringS2S.v2 as it does not exist * Add training data annotation for GIST embedding #2166 * fix max tokens for kalm models #2162 * remove eli 5 * 1.35.2 Automatically generated by python-semantic-release * feat: Add MIEB and MIEB-lite as benchmarks (#2035) * add mieb and mieb-lite to benchmarks * add CompositionalityEvaluation and DocumentUnderstanding types * add VisionCentric type * add missing comma * split STS17MultilingualVisualSTS and STSBenchmarkMultilingualSTS to eng and non-eng * use aggregate task instead so we can name the subsets * shorten names * fix import * alternative strategy to avoid using get_task * follow other aggregate tasks and skip metadata test * run LB without errors when selecting MIEB(-lite) * add back the capability as taks type * typo * extend description * split into mieb(eng) and mieb(multilingual) * remove unneeded files * remove aggtask additions for test * edit descriptions based on screenshots * shorten * rename to Compositionality and include ImageCoDeT2IMultiChoice * re-tag missing VisionCentric tasks * re-tag rparis and roxford as retrieval and include fixes * re-tag voc2007 as image cls * make lint * correct num task types in descriptions * add one model to models_to_annotate * add mieb reference models * update task types * relabel to multilingual retrieval task type to align with paper * fix reference and bibtex * edit task list to match with final list * add back agg task to reproduce table column in paper * fix filtering and import * update tests * mieb lite add back missing tasks * fix metadata test * multi should have all 4 variants * fix task counts * lite has 10 task types * fix visualSTS-17 lang splits * Aggregate task can now use subsets & eval langs to filter TaskResults * fix test and mark VisualSTS17 as multilingual * fix tests * add agg task running script * add voyage meta * fix citations * capitalize * add coarse/fine labels --------- Co-authored-by: gowitheflow-1998 <jsbs54@durham.ac.uk> * Update tasks table * 1.36.0 Automatically generated by python-semantic-release * fix: update training datasets and revision for jina models (#2179) * feat: update training datasets and revision for jina models * feat: update training datasets and revision for jina models * fix: Add more training data annotations (#2178) * redo to voyage to only training data * Add training data annotation for Kalm embeddings #2168 * Add correct training data annotations to Stella #2164 * removed fiqa PL as it does not exist * remove ArxivClusteringS2S.v2 as it does not exist * Add training data annotation for GIST embedding #2166 * fix max tokens for kalm models #2162 * remove eli 5 * fix: add training data for Bilingual Embeddings fixes #2167 * 1.36.1 Automatically generated by python-semantic-release * Added training data annotation for e5-base-4k (#2186) * fix: Added training data annotations to MXBAI (#2185) * fix: Update MTEB(Scandinavian) to use new DanFEVER (#2180) This also resolves the missing data in the leaderboard. Fixes #2172 * fix: Added training data annotation for MMLW models (#2188) * Added training data annotation for MMLW models * Added GIST annotations Kenneth missed * Added Stella en 400m training data' * 1.36.2 Automatically generated by python-semantic-release * fix: Added training data for sentence-croissant (#2189) * 1.36.3 Automatically generated by python-semantic-release * fix: update ru models annotation (#2181) * 1.36.4 Automatically generated by python-semantic-release * fix: Alphabetical ordering of tasks in dropdowns (#2191) * 1.36.5 Automatically generated by python-semantic-release * misc: Speed up qrel creation in any2anyretrieval (#2196) * use numpy vectorized operations instead of row-by-row * scores are int * use 'mteb.MTEB' instead of 'MTEB' for custom model (#2199) * add base models for e5 (#2183) * add similar datasets (#2205) * add similar datasets * add nano * update is filled * Update mteb/abstasks/TaskMetadata.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * add labse annotation (#2182) * add labse annotation * Update mteb/models/sentence_transformers_models.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * fix: Fixed leaderboard crash (#2221) * Fixed leaderboard crash * Fixed language selection error * Ran linting * 1.36.6 Automatically generated by python-semantic-release * fix: More training data annotations (#2220) * Added training data annotation for bge-gemma * Added missing annotations for Voyage models * Added training data for sts-multilingual-mpnet * Added all mteb datasets to STS-multilingual training data * 1.36.7 Automatically generated by python-semantic-release * Add LLM2CLIP (OpenAI variants) (#2222) * model loading and get_text_embeddings * add image_emb, fused_emb, and calc probs methods * add b16 model * add llm2clip_openai_l_14_224 (not working yet) * got llm2clip_openai_l_14_224 working * make lint * add training sets and allow py files * Change `dataset on HF` test to use official api (#2213) * refactor dataset checking * increase timeout * increase timeout * remove timeout * Descriptive stats functions for Any2AnyMC and ImageTextPC (#2197) * Add Any2AnyMC descriptive stats * Add descriptive stats function for ImageTextPC * add descriptive stats examples * linter * update multi choice descriptive stats * Update tasks table * fix: Add training data annotations to uderver-bloom models (#2210) * fix: Add training data annotations to uderver-bloom models fixes #2193 * fix: add mixedbread --------- Co-authored-by: Márton Kardos <power.up1163@gmail.com> * 1.36.8 Automatically generated by python-semantic-release * Add comment to `voyage-3-m-exp` model (#2229) * remove model size from voyage-3-m-exp model * Update mteb/models/voyage_models.py * Update mteb/models/voyage_models.py * docs: Update description of EURLex (#2231) * Automatically add similar tasks to training_tasks (#2228) * refactor dataset checking * increase timeout * increase timeout * remove timeout * start * automatically find datasets * update comment * fix aggregate task metadata * fixes * lint * rename * update fetch check * Remove overlapping legends from radar chart (#2195) * Remove overlapping legends from radar chart * ensure graph is not blocked * Overlapping legend issue of Radar Chart * misc: Run Any2AnyRetrieval descriptive stats (#2223) * run a few datasets * add a few more * run more tasks * add more datasets * remove pdb * remove newline * add more datasets * Update tasks table * misc: Add rest of the vision centric and compositionality descriptive stats (#2267) add the rest * Update tasks table * Fix `calculate_memory_usage_mb` in adding_a_model.md (#2271) * Add Arabic-Triplet-Matryoshka-V2 model metadata to MTEB (#2270) * Add Arabic-Triplet-Matryoshka-V2 model metadata to MTEB * Update memory_usage_mb with correct calculated value * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * remove comments * added correct memory usage * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Apply linter fixes with ruff * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Add Arabic_Triplet_Matryoshka_V2 to overview.py * Rename model file to ara_models.py and update imports --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix: Add WebFAQ Retrieval dataset (#2236) * Add WebFAQ Retrieval dataset Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Small change WebFAQRetrieval.py Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Add remaining languages to WebFAQ Retrieval task Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Add descriptive stats Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> --------- Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Update tasks table * 1.36.9 Automatically generated by python-semantic-release * fix: Formatting issue in Performance Plot (#2237) * Formatting issue in Performance Plot * make lint * added function for better code readability * 1.36.10 Automatically generated by python-semantic-release * ci: run test_dataset_on_hf separately (#2201) * dont run test_dataset_on_hf in every pr * lint * Update call pytest test_datasets Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update tests/test_tasks/test_all_abstasks.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * not datasets for test * run dataset loading test for push or pull_request * apply feedback --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * add gemini-embedding-exp-03-07 (#2279) * add gemini-embedding-exp-03-07 * remove space for lint * lint fix * update link (#2281) * fix: Run remaining MIEB desc stats (#2288) * run Vidore * GLDv2 * run the rest --------- Co-authored-by: Isaac Chung <isaac@hn496lf4f9.lan> * Update tasks table * 1.36.11 Automatically generated by python-semantic-release * fix: Added Filter Modality (#2262) * Added Filter Modality * resolve suggestions * make lint * make sure test pass * make lint * added exclusive_modality_filter and unit tests * Integrate tests on overview.py * Update tests/test_overview.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * added task related to image modality * Update mteb/abstasks/AbsTask.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update mteb/abstasks/AbsTask.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * update overview..py * make lint * update documentation --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * 1.36.12 Automatically generated by python-semantic-release * fix: Add `ModelMeta` license & custom validations (#2293) * license validation * move licenses * update imports --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * 1.36.13 Automatically generated by python-semantic-release * ci: Add pre-commit hook (#2194) * make dev life nicer - pre-commit hooks * add pre-commit to install * update precommit * update ruff pre-commit * lint * lint --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> * Update tasks table * fix: bug in voyage implementation (#2304) * fix: Fix bug in voyage implementation "passage" is not a valid input for the voyage API. Remapped to "document". * Update mteb/models/voyage_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * 1.36.14 Automatically generated by python-semantic-release * fix: Update voyage name to include Org. (#2322) * 1.36.15 Automatically generated by python-semantic-release * Added VDR Model (#2290) * Added VDR Model * change custom wrapper to SentenceTransformer Wrapper * remove kwargs and add TODO for Image Modality * Update mteb/models/vdr_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix: Resolve conflicting dependencies (#2323) These errors where discovered when trying to install the package using `uv`. We have a problem with salesforce-lavis, which is not compatible with the current set of dependencies. * 1.36.16 Automatically generated by python-semantic-release * fix: remove SyntaxWarnings in py312 (#2325) * fix: Resolve conflicting dependencies These errors where discovered when trying to install the package using `uv`. We have a problem with salesforce-lavis, which is not compatible with the current set of dependencies. * fix: Remove syntax warnings occuring in python 3.12 ``` Python 3.12.0 (main, Oct 2 2023, 20:56:14) [Clang 16.0.3 ] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import mteb # no syntax warnings >>> ``` * 1.36.17 Automatically generated by python-semantic-release * fix: add annotation models for stella zh (#2277) * fix: add annotation models for stella zh Additionally fixed a few annotation errors * format * Update mteb/models/stella_models.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * 1.36.18 Automatically generated by python-semantic-release * fix: Add ModelMeta rubert-mini-frida, BERTA (#2330) * Add rubert-mini-frida model meta * Add BERTA model meta * docs: fix typos * 1.36.19 Automatically generated by python-semantic-release * fix: Add WebFAQ bitext mining tasks (#2326) * Add WebFAQ bitext mining tasks Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Lower number of language pairs in WebFAQBitextMining Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> --------- Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Update tasks table * 1.36.20 Automatically generated by python-semantic-release * fix: Add `trust_remote_code` to MIRACLRetrieval * fix: Add `trust_remote_code` to MIRACLRetrieval (#2344) * 1.36.21 Automatically generated by python-semantic-release * fix: Correctly pass trust remote code to Miracl * fix: Ensure MIRACL pass trust_remote_code (#2346) * fix: Add `trust_remote_code` to MIRACLRetrieval * fix: Correctly pass trust remote code to Miracl * fix * 1.36.22 Automatically generated by python-semantic-release * add-Data Korean Clustering dataset (KLUE-modified) (#2283) * add PatentFnBClustering.py * do make lint and revise * rollback Makefile * Update mteb/tasks/Clustering/kor/PatentFnBClustering.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * klue_mrc_domain * make lint * klue_modified_clustering_dataset --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Rename dunzhang and Jasper models to NovaResearch (#2373) * Rename dunzhang and Jasper models to NovaResearch * rename model in tests * correct reference link * correct MIEB dataset stats (#2374) * correct stats * update Any2AnyMultiChoice qrels stats compute logic * final correction * Update tasks table * Correct -1 to No information in Zero shot (#2381) * fix leaderboard (#2385) * fix: Reduce logging and Warnings (#2349) * Reduce logging and Warnings * make lint * format license to lowercase * Address all comments * Update mteb/leaderboard/app.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * 1.36.23 Automatically generated by python-semantic-release * fix: b1ade (#2386) * fix: added b1ade_models.py (#2340) * added b1ade_models.py * changing based on requested * Update mteb/models/b1ade_models.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * fix: missing import and formatting --------- Co-authored-by: Shreyas Subramanian <shreyas.f117@gmail.com> * 1.36.24 Automatically generated by python-semantic-release * fix: pin gradio dependency to ensure leaderboards works (#2387) * 1.36.25 Automatically generated by python-semantic-release * fix: Ensure BrightRetrieval is valid to run (#2334) * fix: Ensure BrightRetrieval is valid to run Not sure this is the best way to fix this. Let me know if you can find a better fix. fixes #2327 * fix: convert brightretrieval to two tasks * fix collecting error * Update tasks table * 1.36.26 Automatically generated by python-semantic-release * Pass task name to all evaluators (#2389) * pass task name to all tasks * add test * fix loader * fix: renaming Zeroshot -> ZeroShot (#2395) * fix: renaming Zeroshot -> ZeroShot Adresses #2078 * rename 1 * rename 2 * format * fixed error * 1.36.27 Automatically generated by python-semantic-release * fix: Update AmazonPolarityClassification license (#2402) Update AmazonPolarityClassification.py * fix b1ade name (#2403) * 1.36.28 Automatically generated by python-semantic-release * Minor style changes (#2396) * fix: renaming Zeroshot -> ZeroShot Adresses #2078 * fix: minor style changes Adresses #2078 * rename 1 * rename 2 * format * fixed error --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Added new dataset and tasks - ClusTREC-covid , clustering of thematic covid related scientific papers (#2302) * Clustrec covid new dataset and task * fix * fix * fix * fix * fix * descriptive stats * change all mentions of clustrec-covidp2p to clustrec-covid * change ' to " * Update tasks table * fix: Major updates to docs + make mieb dep optional (#2397) * fix: renaming Zeroshot -> ZeroShot Adresses #2078 * fix: minor style changes Adresses #2078 * fix: Major updates to documentation This PR does the following: - This introduced other modalities more clearly in the documentation as well as make it easier to transition to a full on documentation site later. - added minor code updates due to discovered inconsistencies in docs and code. - Added the MMTEB citation where applicable - makes the docs ready to move torchvision to an optional dependency * Moved VISTA example * rename 1 * rename 2 * format * fixed error * fix: make torchvision optional (#2399) * fix: make torchvision optional * format * add docs * minor fix * remove transform from Any2TextMultipleChoiceEvaluator --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * move Running SentenceTransformer model with prompts to usage --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * 1.36.29 Automatically generated by python-semantic-release * remove Arabic_Triplet_Matryoshka_V2.py (#2405) * Min torchvision>0.2.1 (#2410) matching torch>1.0.0 * fix: Add validation to model_name in `ModelMeta` (#2404) * add test for name validation * upd docs * upd cohere name * fix tests * fix name for average_word_embeddings_komninos * fix name for average_word_embeddings_komninos * fix reranker test * fix reranker test * 1.36.30 Automatically generated by python-semantic-release * [MIEB] "capability measured"-Abstask 1-1 matching refactor [1/3]: reimplement CV-Bench (#2414) * refactor CV-Bench * reimplement CV Bench * remove abstask/evaluator/tests for Any2TextMultipleChoice * rerun descriptive stats * Update tasks table * fix: Add option to remove benchmark from leaderboard (#2417) fix: Add option to remove leaderboard from leaderboard fixes #2413 This only removed the benchmark from the leaderboard but keep it in MTEB. * 1.36.31 Automatically generated by python-semantic-release * fix: Add VDR Multilingual Dataset (#2408) * Added VDR Multilingual Dataset * address comments * make lint * Formated Dataset for retrieval * Update mteb/tasks/Retrieval/multilingual/VdrMultilingualRetrieval.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Retrieval/multilingual/VdrMultilingualRetrieval.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * make lint * corrected date * fix dataset building * move to image folder --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Update tasks table * 1.36.32 Automatically generated by python-semantic-release * HOTFIX: pin setuptools (#2423) * pin setuptools * pin setuptools * pin setuptools in makefile * try ci * fix ci * remove speed from installs * add __init__.py Clustering > kor folder, And edit __init__.py in Clustering folder (#2422) * add PatentFnBClustering.py * do make lint and revise * rollback Makefile * Update mteb/tasks/Clustering/kor/PatentFnBClustering.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * klue_mrc_domain * make lint * klue_modified_clustering_dataset * clustering & kor folder add __init.py * clustering & kor folder add __init__.py * task.py roll-back * correct text_creation to sample_creation & delete form in MetaData * correct task_subtype in TaskMetaData * delete space * edit metadata * edit task_subtypes --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update tasks table * Update speed dependencies with new setuptools release (#2429) * add richinfoai models (#2427) * add richinfoai models add richinfoai models * format codes by linter format codes by linter * Added Memory Usage column on leaderboard (#2428) * docs: typos; Standardize spacing; Chronological order (#2436) * Fix typos; add chrono order * Fix spacing * fix: Add model specific dependencies in pyproject.toml (#2424) * Add model specific dependencies in pyproject.toml * Update documentation * 1.36.33 Automatically generated by python-semantic-release * [MIEB] "capability measured"-Abstask 1-1 matching refactor [2/3]: reimplement r-Oxford and r-Paris (#2442) * MutipleChoiceEvaluationMixin; reimplement r-Oxford and r-Paris; rerun stats * modify benchmark list * fix citation * Update tasks table * Error while evaluating MIRACLRetrievalHardNegatives: 'trust_remote_code' (#2445) Fixes #2444 * Feat/searchmap preview (#2420) * Added meta information about SearchMap_Preview model to the model_dir * Added meta information about SearchMap_Preview model to the model_dir * updated revision name * Device loading and cuda cache cleaning step left out * removed task instructions since it's not necessary * changed sentence transformer loader to mteb default loader and passed instructions s model prompts * Included searchmap to the models overview page * Included searchmap to the models overview page * added meta data information about where model was adpated from * Update mteb/models/searchmap_models.py * fix lint * lint --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * Add Background Gradients in Summary and Task Table (#2392) * Add Background Gradients in Summary and Task Table * Remove warnings and add light green cmap * Address comments * Separate styling function * address comments * added comments * add ops_moa_models (#2439) * add ops_moa_models * add custom implementations * Simplify custom implementation and format the code * support SentenceTransformers * add training datasets * Update mteb/models/ops_moa_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * update training_datasets --------- Co-authored-by: kunka.xgw <kunka.xgw@taobao.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * leaderboard fix (#2456) * ci: cache `~/.cache/huggingface` (#2464) ci: cache ~/.cache/huggingface Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> * [MIEB] "capability measured"-Abstask 1-1 matching refactor [3/3]: reimplement ImageCoDe (#2468) * reimplement ImageCoDe with ImageTextPairClassification * add missing stats file * Update tasks table * fix: Adds family of NeuML/pubmedbert-base-embedding models (#2443) * feat: added pubmedbert model2vec models * fix: attribute model_name * fix: fixed commit hash for pubmed_bert model2vec models * fix: changes requested in PR 2443 * fix: add nb_sbert model (#2339) * add_nb_sbert_model * Update nb_sbert.py added n_parameters and release_date * Update mteb/models/nb_sbert.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update nb_sbert.py fix: make lint * added nb_sbert to overview.py + ran make lint * Update nb_sbert.py Fix error: Input should be a valid date or datetime, month value is outside expected range of 1-12 --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * 1.36.34 Automatically generated by python-semantic-release * suppress logging warnings on leaderboard (#2406) * supress logging warnings * remove loggers * return blocks * rename function * fix gme models * add server name * update after merge * fix ruff * fix: E5 instruct now listed as sbert compatible (#2475) Fixes #1442 * 1.36.35 Automatically generated by python-semantic-release * [MIEB] rename VisionCentric to VisionCentricQA (#2479) rename VisionCentric to VisionCentricQA * ci: Run dataset loading only when pushing to main (#2480) Update dataset_loading.yml * fix table in tasks.md (#2483) * Update tasks table * fix: add prompt to NanoDBPedia (#2486) * 1.36.36 Automatically generated by python-semantic-release * Fix Task Lang Table (#2487) * Fix Task Lang Table * added tasks.md * fix * fix: Ignore datasets not available in tests (#2484) * add back MockAudioEncoder --------- Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: github-actions <github-actions@github.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Mina Parham <36207068+mina-parham@users.noreply.github.com> Co-authored-by: Mina Parham <minaparham@Keatext.local> Co-authored-by: Mehrzad Shahin-Moghadam <42153677+mehrzadshm@users.noreply.github.com> Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> Co-authored-by: Sam <40773225+sam-hey@users.noreply.github.com> Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> Co-authored-by: Shikhar Shiromani <rbk.shikhar@gmail.com> Co-authored-by: Shikhar Shiromani <sshiromani@sshiromani-mlt.client.nvidia.com> Co-authored-by: Ruslan Bel'kov <ruslan.belckov@yandex.ru> Co-authored-by: Márton Kardos <power.up1163@gmail.com> Co-authored-by: sufen-f <sufenfong@gmail.com> Co-authored-by: sufen <sufenf@gmail.com> Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> Co-authored-by: Samuel Yang <samuelyang150@gmail.com> Co-authored-by: Aradhye Agarwal <aradhyeagarwal@gmail.com> Co-authored-by: Tom Aarsen <37621491+tomaarsen@users.noreply.github.com> Co-authored-by: talshef <tsheffer@gmail.com> Co-authored-by: Tal Sheffer <tal.s@codium.ai> Co-authored-by: garciasces <garciasces@madrid.es> Co-authored-by: gowitheflow-1998 <jsbs54@durham.ac.uk> Co-authored-by: Wang Bo <bo.wang@jina.ai> Co-authored-by: Munot Ayush Sunil <munotayush6@kgpian.iitkgp.ac.in> Co-authored-by: Yaya Sy <58347382+yaya-sy@users.noreply.github.com> Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com> Co-authored-by: Eng. Omar Najar <79968243+omarnj-lab@users.noreply.github.com> Co-authored-by: Michael Dinzinger <39766249+michaeldinzinger@users.noreply.github.com> Co-authored-by: Jinhyuk Lee <lee.jnhk@gmail.com> Co-authored-by: Isaac Chung <isaac@hn496lf4f9.lan> Co-authored-by: sergeyz-zh <49659999+sergeyz-zh@users.noreply.github.com> Co-authored-by: OnandOn <76710635+OnAnd0n@users.noreply.github.com> Co-authored-by: chenghao xiao <85804993+gowitheflow-1998@users.noreply.github.com> Co-authored-by: Shreyas Subramanian <shreyas.f117@gmail.com> Co-authored-by: Uri K <37979288+katzurik@users.noreply.github.com> Co-authored-by: richinfo-ai <richinfoai@163.com> Co-authored-by: Adewole Babatunde <40810247+Free-tek@users.noreply.github.com> Co-authored-by: ahxgw <ahxgwOnePiece@gmail.com> Co-authored-by: kunka.xgw <kunka.xgw@taobao.com> Co-authored-by: Nadia Sheikh <144166074+nadshe@users.noreply.github.com> Co-authored-by: theatollersrud <thea.tollersrud@nb.no> Co-authored-by: hongst <76415500+seongtaehong@users.noreply.github.com>
isaac-chung
added a commit
that referenced
this pull request
Feb 16, 2026
* Started the following: - define audio encoder interface - implement abstask and evaluator for clustering * Minor changes and linted files. #2093 * Minor changes and linted files. #2093 * Minor changes and linted files. #2093 * Refs #2068: Initial Implementation of audio-text retrieval abstask and evaluator * Added MockAudioClustering task + MockAudioEncoder for testcase Created test_maeb_datasets.py to test AbsTask and Evaluator for clustering * MockAudioClustering + MockAudioEncoder (#2093) * Added wav2vec model wrapper * Added subTask with small sample of dataset for testing * Added four w2v variants * Update wav2vec_models.py * Added wav2vec (5), wavlm (7), and whisper (5) models * Added revisions from HF to wav2vec models, added silhouette score, DBSCAN and agglomerative algorithms into clustering evaluator, added algorithm selector into VoiceGender * Update mteb/models/wavlm_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * setting up colab * added a2a * PCA + hidden layer + shuffling * New task: emotion clustering * Added qwen2 model * Added Wav2Vec model, voice clustering task, VoxCeleb dataset subset (#2175) * Added wav2vec model wrapper * Added four w2v variants * Update wav2vec_models.py * Removed run.py test script * Added subTask with small sample of dataset for testing * Removed test portion of VoiceGender.py task * add commit hash and bibtex * make lint * update models * fix circular import * make VoiceGender discoverable in get_tasks * add a2a as category for clustering * specify latest commit hash * revert linting changes * Based on feedback for model: updated w2v2 revisions and added torchaudio to .toml file * Added Bibtex for dataset, set data to be test instead of training, shortened task_subtype * Changed task from Voice Gender Clustering to Gender Clustering. * Fixed mock audio clustering tests * Added dataset metadata * Linted * Passed revision into the w2v2 loader * passed lint check * Linted * Update VoiceGender.py --------- Co-authored-by: Ali Sartaz Khan <alisartazkhan@gmail.com> Co-authored-by: Ali Sartaz Khan <71156712+alisartazkhan@users.noreply.github.com> Co-authored-by: mn <mn@Ms-MacBook-Pro.local> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Revert "Maeb - added voice clustering task, wav2vec model and VoxCeleb dataset (subset)" (#2202) * Revert "Revert "Maeb - added voice clustering task, wav2vec model and VoxCeleb dataset (subset)"" (#2203) Revert "Revert "Maeb - added voice clustering task, wav2vec model and VoxCele…" This reverts commit ee10191f1d4f10f705a595062cc04d749f9a2dc3. * Revert "Revert "Revert "Maeb - added voice clustering task, wav2vec model and VoxCeleb dataset (subset)""" (#2207) Revert "Revert "Revert "Maeb - added voice clustering task, wav2vec model and…" This reverts commit f1449c07f8f6c8a084daab572515f3110cd67027. * Add Audio (Multi Label) Classification Abstask, Baseline Audio model, FSD50k Dataset and Task (#2082) * init audio * some encoder related changes * some more abs task defs Co-authored-by: rahulschand <rahulsc@stanford.edu> * evaluators and classification * remove rahul changes to generate first PR * make lint * add dataset/tasks skeleton * readd changes lost in rebase * add fsd50k * add task categories for audio * slight updates to fsd50k * make lint * wav2vec2 model * add fsd50k metadata * rename folder * add metric * add torchaudio in req * reigster wav2vec2 models * fixes * add audio in valid task types * mock interface changes * make lint * rm audio clustering * wav2vec2 model revision update * rm comment * rm test.py * add revisions to all wav2vec2 models * rm empty abstask files * rm empty evaluator files * rm empty task files * Update tests/test_tasks/test_all_abstasks.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Update mteb/models/wav2vec2_models.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * rm non-logReg evaluators for audio classification * lint * fn name changed to convert_audio_from_numpy * rm mock tests for audio kNN classification * rm evaluators for audio kNN classification * fix imports * fix audio kNN; make lint * rm AbsTaskAudioClassification.py for later PR * remove commented code; reset changes to ClassificationEvaluator.py * fix mock tasks for multilabel classification * make lint * inherit Wrapper class * add all languages supported by wav2vec2 * make lint * add script info to all languages * make lint * recent changes * merge wav2vec2 + add updated logic for auto padding for fqd50k type datasets * make lint remove uwanted files * remove debug lines * remove esc50 refs * fix mock tasks for multilabel * fix mock tasks for multilabel * Revert "Merge branch 'maeb' into maeb" bad direct commit made to upstream maeb branch https://github.com/embeddings-benchmark/mteb/commit/4f23fdf27e3263188fe89031d186b293a279153e This reverts commit 4f23fdf27e3263188fe89031d186b293a279153e, reversing changes made to 130247730a07ad1fd2d06a878b7c9bb15af07af7. * fix model imports * fqd50k cleaning * update fsd50k * change dataset * eval subsets correctly * make lint and remove debug statements * clean print statements * make lint * update fsd2019 dataset * remove init in AbsTaskAudioMultilabelClassification.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * add class parameters in AbsTaskAudioMultilabelClassification * inherit from multilingualtask for FSD2019Kaggle * make lint * update mock_tasks; make lint * remove train_split from fn parameters * define fsd2019k to be multilingual * inherit from MultilingualTask in fsd2019K * fix tests * inherit correct multingial task class * remove MockAudioMultilabelClassificationLogRegTask * rm other instances of MockAudioMultilabelClassificationLogRegTask --------- Co-authored-by: rahulschand <rahulsc@stanford.edu> Co-authored-by: Silky Singh <silky1708@gmail.com> Co-authored-by: Silky Singh <54901747+silky1708@users.noreply.github.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Add ESC50 and zero-shot classification (#2133) * init audio * some encoder related changes * some more abs task defs Co-authored-by: rahulschand <rahulsc@stanford.edu> * evaluators and classification * remove rahul changes to generate first PR * make lint * init audio * some encoder related changes * some more abs task defs Co-authored-by: rahulschand <rahulsc@stanford.edu> * evaluators and classification * remove rahul changes to generate first PR * make lint * add dataset/tasks skeleton * readd changes lost in rebase * add fsd50k * add task categories for audio * slight updates to fsd50k * make lint * wav2vec2 model * add fsd50k metadata * rename folder * add metric * add torchaudio in req * reigster wav2vec2 models * fixes * add audio in valid task types * mock interface changes * my 0 shot * make lint * rm audio clustering * wav2vec2 model revision update * rm comment * rm test.py * add revisions to all wav2vec2 models * rm empty abstask files * rm empty evaluator files * rm empty task files * Update tests/test_tasks/test_all_abstasks.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Update mteb/models/wav2vec2_models.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * rm non-logReg evaluators for audio classification * lint * fn name changed to convert_audio_from_numpy * rm mock tests for audio kNN classification * rm evaluators for audio kNN classification * fix imports * fix audio kNN; make lint * rm AbsTaskAudioClassification.py for later PR * added zero-shot loading model and dataset checked * remove commented code; reset changes to ClassificationEvaluator.py * fix mock tasks for multilabel classification * make lint * inherit Wrapper class * add all languages supported by wav2vec2 * make lint * add script info to all languages * make lint * before cleaning comments * ESC and clap model. Tested 81 percent zero-shot numbers * fixed label names for ESC50-multilabel and removed comments * recent changes * merge wav2vec2 + add updated logic for auto padding for fqd50k type datasets * make lint remove uwanted files * remove debug lines * remove esc50 refs * changes for debugging * lint changes and maeb main branch merge * fix mock tasks for multilabel * fix mock tasks for multilabel * Revert "Merge branch 'maeb' into maeb" bad direct commit made to upstream maeb branch https://github.com/embeddings-benchmark/mteb/commit/4f23fdf27e3263188fe89031d186b293a279153e This reverts commit 4f23fdf27e3263188fe89031d186b293a279153e, reversing changes made to 130247730a07ad1fd2d06a878b7c9bb15af07af7. * fix model imports * fqd50k cleaning * fixed error in Image zero shot classfification * update fsd50k * change dataset * eval subsets correctly * make lint and remove debug statements * clean print statements * make lint * update fsd2019 dataset * remove init in AbsTaskAudioMultilabelClassification.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * add class parameters in AbsTaskAudioMultilabelClassification * inherit from multilingualtask for FSD2019Kaggle * make lint * update mock_tasks; make lint * remove train_split from fn parameters * define fsd2019k to be multilingual * inherit from MultilingualTask in fsd2019K * fix tests * inherit correct multingial task class * remove MockAudioMultilabelClassificationLogRegTask * rm other instances of MockAudioMultilabelClassificationLogRegTask * removed unncessary files * removed unncrssary files * removed uncrssary files part 3 * deleted esc50 from multi label classification * fixed errors * fixed lintng, added precision and recall. Removed extra comments * fixed double loading of model * filled in missing meta-data * fixed linting --------- Co-authored-by: Animesh Jha <jha.animesh01@gmail.com> Co-authored-by: rahulschand <rahulsc@stanford.edu> Co-authored-by: Silky Singh <silky1708@gmail.com> Co-authored-by: Silky Singh <54901747+silky1708@users.noreply.github.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Add unfused clap model for zero-shot (#2269) * added unfused model * fixed lint * Add new and complete version of FSD50K multi-label audio classification task (#2285) * Added fsd50k dataset on huggingface * added correct hf version of fsd50k dataset * added correct hf version of fsd50k dataset * removed extra imports * removed unecessary load_data fn * added large, music and speech clap models (#2284) * added large, music and speech clap models * fixed public_training_data and removed training_datasets split * added latest revision * lowercase mit license * fixed issue related to training_datasets * fixed lint * add AbsTaskAudioClassification, ESC50 & GunshotTriangulation datasets (#2287) * add AbsTaskAudioClassification, and ESC50, GunshotTriangulation datasets * rm CrossFold abstask for classification * replace class parameter to is_cross_validation * new function eval_subset_cross_validation in AbsTaskAudioClassification * sync maeb branch with mteb:maeb * Add NSynth dataset (#2306) * add AbsTaskAudioClassification, and ESC50, GunshotTriangulation datasets * rm CrossFold abstask for classification * replace class parameter to is_cross_validation * new function eval_subset_cross_validation in AbsTaskAudioClassification * sync maeb branch with mteb:maeb * add NSynth dataset * update TaskMetadata * import correct Nsynth * add domain music * rm print stmt; update metadata; make pr * update precise dataset size * Add urbansound8k for zero-shot (#2292) * added urbansound8k and tested * lint change * changed dataset type and label * removed left out comments * fixed lint * Add Emotion classification Ravdess dataset (#2320) * added Ravdess dataset * fixed lint * fixed PR comments, added description, changed name * [MAEB] main merge (#2341) * misc: Add image classification descriptive stats implementation (#2045) * add ImageClassificationDescriptiveStatistics * add MNIST descriptive stats * use tuples instead * add label count and update docstrings * update MNIST example * Update tasks table * fix: Add column descriptions to leaderboard (#2039) * fix: Add column descriptions to leaderboard * removed existing overlap * fix: Add BRIGHT (long) and fix bug in TaskResult.filter_and_validate() (#2041) * fix: Add BRIGHT Long Fixes #1978 * fix: Add BRIGHT(long) * fix bug in task results * updated bright * updated tests for TaskResults * 1.34.12 Automatically generated by python-semantic-release * misc: Add image clustering descriptive stats implementation (#2057) * add image clustering descirptive stats and run * finish off last one * remove script * fix: Update embed_dim for jina models (#2058) see https://github.com/embeddings-benchmark/results/pull/117 * Update tasks table * 1.34.13 Automatically generated by python-semantic-release * Add giga embeddings (#1741) * add gigaembeddings * use jasper * fix name * create sentence_transformer instruct wrapper * apply instruction template * fix jasper * update meta * misc: Add ZS and multilabel image classification descriptive stats implementation (#2059) * add image clustering descirptive stats and run * finish off last one * remove script * add ImageMultilabelClassificationDescriptiveStatistics * add VOC2007 * add zeroshot and mnist example * Update tasks table * Rename MIEB task classes with duplicated names (#2061) fix class names * misc: Add VisualSTS descriptive stats (#2062) * add visualsts stats * add last dataset * Update tasks table * fix: Added gte models (#1539) * fix: Added gte models * fix: Add mixbai models (#1540) for #1515 * fix: Add climate fever v2 (#1873) * Updated ClimateFEVER dataset with new version * Adds Fill in the empty metadata. * Updates the date tuple * Update class name Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update domains Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update task_subtypes * Update annotations_creators for the first version * Update date Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update task subtypes * Update path * Update description --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Mina Parham <minaparham@Keatext.local> * Update tasks table * fix: Updating paper scripts (#1958) * change reference revisions to align with paper * Update author list * Added code for main results table * updated minor changes * added external as a "no_revision_available" case * revert unintended changes * format * 1.34.14 Automatically generated by python-semantic-release * Add datasets for a benchmark newly introduced for "Engineering" domain (#1911) * adding clustering tasks (built-bench-clustering S2S & P2P) * updated built-bench-clustering tasks * Updated BuiltBenchClustering tasks * Added "Engineering" as new domain to TaskMetadata.py * Updated tasks table in docs * Updated task metadata for BuiltBenchClustering S2S and P2P * updated metadata for clustering tasks * Add/update BuiltBench tasks - Add BuiltBenchRetrieval task - Add BuiltBenchReranking task - Update metadata for BuiltBenchClusterinP2P - Update metadata for BuiltBenchClusterinS2S * update BuiltBench benchmark * Update mteb/benchmarks/benchmarks.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Clustering/eng/BuiltBenchClusteringS2S.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Clustering/eng/BuiltBenchClusteringP2P.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/benchmarks/benchmarks.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Fix formatting via ruff --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Update tasks table * misc: update model names to adjust for adding to results repo (#2074) * update model names to adjust for adding to results repo * update model meta script * misc: Add all image classification descriptive stats (#2073) * add most image classification descr stats * revert changes to encoder * add stats --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * Update tasks table * ci: Rerun tests that fail due to networking issues. (#2029) * fix: rerun tests that fail - Networking * update tests to use tmp_path * set versions for dev dependencies * add pytest options to pyproject.toml * add rerun json.decoder.JSONDecodeError * remove JSONDecodeError from pyproject.toml * add huggingface_hub.errors.HfHubHTTPError * add huggingface_hub.errors.LocalEntryNotFoundError https://github.com/embeddings-benchmark/mteb/actions/runs/13298535701/job/37139767443?pr=2044 * FileNotFoundError https://github.com/embeddings-benchmark/mteb/actions/runs/13302915091/job/37147507251?pr=2029 * add doc to pytest rerun --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> * fix: generate metadata (#2063) * fix: generate metadata * use logging not print for script * lint * add iso639 to dev pyproject * fix import * add memory_usage_mb * set version for iso639 Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * 1.34.15 Automatically generated by python-semantic-release * fix: add missing `e5` training datasets (#2065) add missing training datasets * 1.34.16 Automatically generated by python-semantic-release * fix: Ensure voyage model uses different naming scheme (#2083) * fix: Added make command for running leaderboard locally * fix: Ensure voyage models doesn't re-use the name * 1.34.17 Automatically generated by python-semantic-release * fix: Freeze model/rank columns in leaderboard (#2044) * fix: freeze model/rank columns in leaderboard * freezing zero-shot column * update min gradio version to 5.16.0 in pyproject.toml --------- Co-authored-by: Shikhar Shiromani <sshiromani@sshiromani-mlt.client.nvidia.com> * 1.34.18 Automatically generated by python-semantic-release * fix: Fixed previous incorrect specification of splits for CMTEB ( MTEB(cmn, v1) ) (#2086) Fixes #2064 * 1.34.19 Automatically generated by python-semantic-release * Remove duplicated string in docstring of TaskMetadata class (#2087) * Remove duplicated string in docstring of TaskMetadata class * Remove duplicated dataset field * fix: Smarter leaderboard caching with cachetools (#2085) * Added smarter caching to callbacks * Added cachetools as a dependency * Ran linting * Removed debugging print statement * Bumped Gradio version * Dependency fixes * Dependency fixes --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * fix: Missing fixes for #2086 - change MultilingualSentiment split from test to validation in CMTEB (#2088) * fix: Fixed previous incorrect specification of splits for CMTEB ( MTEB(cmn, v1) ) Fixes #2064 * change MultilingualSentiment split from test to validation in CMTEB * 1.34.20 Automatically generated by python-semantic-release * merge gme models (#2089) * fix: Add back task filtering by modalities (#2080) * add back task filtering by modalities * add unit test * check if task modalities is a subset of model modalities and fix tests * add model_modalities_more_than_task_modalities case * 1.34.21 Automatically generated by python-semantic-release * Added gtr-t5-base/large/xl/xxl metadata to mteb (#2092) * Added GTR Models to codebase * Linted gtr models file. * Added gtr-base/large/xl/xxl to sentence_transformers_models.py * Added memory_usage_mb and training_datasets * Reformatted training dataset names * Reformatted training dataset names * Reformatted training dataset names --------- Co-authored-by: sufen <sufenf@gmail.com> * misc: Add Any2TextMutipleChoice Descriptive Statistics (#2095) * add Any2TextMutipleChoiceDescriptiveStatistics * run on all tasks * Update tasks table * fix: Updated model annotations for GTE, e5, gritlm, and SFR models (#2101) Reported with references to paper + qoutes. * fix: Update links (#2098) * Fix link * Fix link * 1.34.22 Automatically generated by python-semantic-release * Add model inf-retriever-v1-1.5b (#2106) Add inf-retriever-v1-1.5b model * docs: Fix typos & refine text (#2102) * Update app.py * Fix typos * misc: Run Zeroshot Classification Descriptive Stats (#2105) * add most datasets * add birdsnap and imgnet1k * add scimmir and sun397 * add uck101 zs * Update tasks table * fix: add warning about task category conversion (#2108) add warning about task category conversion * 1.34.23 Automatically generated by python-semantic-release * fix: Add codesage-large-v2 (#2090) * Add codesage-large-v2 * Address comments * Add training dataset * Fix issues * Format code * Remove unnecessary wrapper * 1.34.24 Automatically generated by python-semantic-release * fix: add training data to BGE-m3-custom-fr (#2110) This ensure that is it correctly filtered as non-zero-shot * 1.34.25 Automatically generated by python-semantic-release * fix: Upgrade ruff to be gradio compatible (#2111) * fix: update ruff to be gradio compatible (>=0.9.3) * format * fix: upgrade ruff to latests (same as gradio compatible) * 1.34.26 Automatically generated by python-semantic-release * docs: Follow google docstring format (#2115) Fixes #2113 * Update leaderboard_refresh.yaml (#2121) * fix InstructSentenceTransformer Model name (#2125) fix params * fix voyage (#2127) * fix: update e5 instruct training data (#2129) update e5 training data * 1.34.27 Automatically generated by python-semantic-release * format * Update tasks table * fix: Add 2 new Static Sentence Transformer models (#2112) * Add 2 new Static Sentence Transformer models * Add Tatoeba Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * 1.34.28 Automatically generated by python-semantic-release * add is_cross_encoder (#1869) * add is_cross_encoder * Update mteb/model_meta.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * change value --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Qodo embed 1 1.5 b (#2137) * feat: Add Qodo-Embed-1-1.5B model metadata * fix: Add Qodo models to overview imports * fix: Add adapted_from field to Qodo model metadata * Update mteb/models/qodo_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * relint --------- Co-authored-by: Tal Sheffer <tal.s@codium.ai> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * misc: merge summary retrieval into bitext mining (#2140) merge summary retrieval into bitext mining * test: fix dataset availability test (#2141) This simplified the test and also make it a lot simpler. It also removed about 100 test cases which where all to the same API call. * fix: Update NVIDIA-Embed training data (#2143) Added a few missing annotations for nvidia-embed * 1.34.29 Automatically generated by python-semantic-release * fix: Add annotations for Voyage exp (#2144) * fix: Update NVIDIA-Embed training data Added a few missing annotations for nvidia-embed * fix update annotationf for voyage exp * 1.34.30 Automatically generated by python-semantic-release * Fix tokens num in cde models (#2148) fix tokens * feat: Add Qodo-Embed-1-7B model metadata and rename existing model (#2146) * feat: Add Qodo-Embed-1-7B model metadata and rename existing model * lint * fix revision * update license name --------- Co-authored-by: Tal Sheffer <tal.s@codium.ai> * 1.35.0 Automatically generated by python-semantic-release * misc: add Any2AnyRetrievalDescriptiveStatistics (#2139) add Any2AnyRetrievalDescriptiveStatistics * Update tasks table * Added zero-shot percentages and different filtering scheme (#2153) * Added zero-shot percentages and different filtering scheme * Update mteb/model_meta.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix: Incorrect annotations for Mistral-based embedding models (#2157) Fixes #2155 * 1.35.1 Automatically generated by python-semantic-release * Update FaMTEBRetrieval.py (#2171) The URL pointed to the settings page instead of the main repo URL. Now it is fixed. * Update tasks table * fix: Add Training data annotations (#2173) * redo to voyage to only training data * Add training data annotation for Kalm embeddings #2168 * Add correct training data annotations to Stella #2164 * removed fiqa PL as it does not exist * remove ArxivClusteringS2S.v2 as it does not exist * Add training data annotation for GIST embedding #2166 * fix max tokens for kalm models #2162 * remove eli 5 * 1.35.2 Automatically generated by python-semantic-release * feat: Add MIEB and MIEB-lite as benchmarks (#2035) * add mieb and mieb-lite to benchmarks * add CompositionalityEvaluation and DocumentUnderstanding types * add VisionCentric type * add missing comma * split STS17MultilingualVisualSTS and STSBenchmarkMultilingualSTS to eng and non-eng * use aggregate task instead so we can name the subsets * shorten names * fix import * alternative strategy to avoid using get_task * follow other aggregate tasks and skip metadata test * run LB without errors when selecting MIEB(-lite) * add back the capability as taks type * typo * extend description * split into mieb(eng) and mieb(multilingual) * remove unneeded files * remove aggtask additions for test * edit descriptions based on screenshots * shorten * rename to Compositionality and include ImageCoDeT2IMultiChoice * re-tag missing VisionCentric tasks * re-tag rparis and roxford as retrieval and include fixes * re-tag voc2007 as image cls * make lint * correct num task types in descriptions * add one model to models_to_annotate * add mieb reference models * update task types * relabel to multilingual retrieval task type to align with paper * fix reference and bibtex * edit task list to match with final list * add back agg task to reproduce table column in paper * fix filtering and import * update tests * mieb lite add back missing tasks * fix metadata test * multi should have all 4 variants * fix task counts * lite has 10 task types * fix visualSTS-17 lang splits * Aggregate task can now use subsets & eval langs to filter TaskResults * fix test and mark VisualSTS17 as multilingual * fix tests * add agg task running script * add voyage meta * fix citations * capitalize * add coarse/fine labels --------- Co-authored-by: gowitheflow-1998 <jsbs54@durham.ac.uk> * Update tasks table * 1.36.0 Automatically generated by python-semantic-release * fix: update training datasets and revision for jina models (#2179) * feat: update training datasets and revision for jina models * feat: update training datasets and revision for jina models * fix: Add more training data annotations (#2178) * redo to voyage to only training data * Add training data annotation for Kalm embeddings #2168 * Add correct training data annotations to Stella #2164 * removed fiqa PL as it does not exist * remove ArxivClusteringS2S.v2 as it does not exist * Add training data annotation for GIST embedding #2166 * fix max tokens for kalm models #2162 * remove eli 5 * fix: add training data for Bilingual Embeddings fixes #2167 * 1.36.1 Automatically generated by python-semantic-release * Added training data annotation for e5-base-4k (#2186) * fix: Added training data annotations to MXBAI (#2185) * fix: Update MTEB(Scandinavian) to use new DanFEVER (#2180) This also resolves the missing data in the leaderboard. Fixes #2172 * fix: Added training data annotation for MMLW models (#2188) * Added training data annotation for MMLW models * Added GIST annotations Kenneth missed * Added Stella en 400m training data' * 1.36.2 Automatically generated by python-semantic-release * fix: Added training data for sentence-croissant (#2189) * 1.36.3 Automatically generated by python-semantic-release * fix: update ru models annotation (#2181) * 1.36.4 Automatically generated by python-semantic-release * fix: Alphabetical ordering of tasks in dropdowns (#2191) * 1.36.5 Automatically generated by python-semantic-release * misc: Speed up qrel creation in any2anyretrieval (#2196) * use numpy vectorized operations instead of row-by-row * scores are int * use 'mteb.MTEB' instead of 'MTEB' for custom model (#2199) * add base models for e5 (#2183) * add similar datasets (#2205) * add similar datasets * add nano * update is filled * Update mteb/abstasks/TaskMetadata.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * add labse annotation (#2182) * add labse annotation * Update mteb/models/sentence_transformers_models.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * fix: Fixed leaderboard crash (#2221) * Fixed leaderboard crash * Fixed language selection error * Ran linting * 1.36.6 Automatically generated by python-semantic-release * fix: More training data annotations (#2220) * Added training data annotation for bge-gemma * Added missing annotations for Voyage models * Added training data for sts-multilingual-mpnet * Added all mteb datasets to STS-multilingual training data * 1.36.7 Automatically generated by python-semantic-release * Add LLM2CLIP (OpenAI variants) (#2222) * model loading and get_text_embeddings * add image_emb, fused_emb, and calc probs methods * add b16 model * add llm2clip_openai_l_14_224 (not working yet) * got llm2clip_openai_l_14_224 working * make lint * add training sets and allow py files * Change `dataset on HF` test to use official api (#2213) * refactor dataset checking * increase timeout * increase timeout * remove timeout * Descriptive stats functions for Any2AnyMC and ImageTextPC (#2197) * Add Any2AnyMC descriptive stats * Add descriptive stats function for ImageTextPC * add descriptive stats examples * linter * update multi choice descriptive stats * Update tasks table * fix: Add training data annotations to uderver-bloom models (#2210) * fix: Add training data annotations to uderver-bloom models fixes #2193 * fix: add mixedbread --------- Co-authored-by: Márton Kardos <power.up1163@gmail.com> * 1.36.8 Automatically generated by python-semantic-release * Add comment to `voyage-3-m-exp` model (#2229) * remove model size from voyage-3-m-exp model * Update mteb/models/voyage_models.py * Update mteb/models/voyage_models.py * docs: Update description of EURLex (#2231) * Automatically add similar tasks to training_tasks (#2228) * refactor dataset checking * increase timeout * increase timeout * remove timeout * start * automatically find datasets * update comment * fix aggregate task metadata * fixes * lint * rename * update fetch check * Remove overlapping legends from radar chart (#2195) * Remove overlapping legends from radar chart * ensure graph is not blocked * Overlapping legend issue of Radar Chart * misc: Run Any2AnyRetrieval descriptive stats (#2223) * run a few datasets * add a few more * run more tasks * add more datasets * remove pdb * remove newline * add more datasets * Update tasks table * misc: Add rest of the vision centric and compositionality descriptive stats (#2267) add the rest * Update tasks table * Fix `calculate_memory_usage_mb` in adding_a_model.md (#2271) * Add Arabic-Triplet-Matryoshka-V2 model metadata to MTEB (#2270) * Add Arabic-Triplet-Matryoshka-V2 model metadata to MTEB * Update memory_usage_mb with correct calculated value * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * remove comments * added correct memory usage * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Apply linter fixes with ruff * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Add Arabic_Triplet_Matryoshka_V2 to overview.py * Rename model file to ara_models.py and update imports --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix: Add WebFAQ Retrieval dataset (#2236) * Add WebFAQ Retrieval dataset Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Small change WebFAQRetrieval.py Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Add remaining languages to WebFAQ Retrieval task Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Add descriptive stats Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> --------- Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Update tasks table * 1.36.9 Automatically generated by python-semantic-release * fix: Formatting issue in Performance Plot (#2237) * Formatting issue in Performance Plot * make lint * added function for better code readability * 1.36.10 Automatically generated by python-semantic-release * ci: run test_dataset_on_hf separately (#2201) * dont run test_dataset_on_hf in every pr * lint * Update call pytest test_datasets Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update tests/test_tasks/test_all_abstasks.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * not datasets for test * run dataset loading test for push or pull_request * apply feedback --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * add gemini-embedding-exp-03-07 (#2279) * add gemini-embedding-exp-03-07 * remove space for lint * lint fix * update link (#2281) * fix: Run remaining MIEB desc stats (#2288) * run Vidore * GLDv2 * run the rest --------- Co-authored-by: Isaac Chung <isaac@hn496lf4f9.lan> * Update tasks table * 1.36.11 Automatically generated by python-semantic-release * fix: Added Filter Modality (#2262) * Added Filter Modality * resolve suggestions * make lint * make sure test pass * make lint * added exclusive_modality_filter and unit tests * Integrate tests on overview.py * Update tests/test_overview.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * added task related to image modality * Update mteb/abstasks/AbsTask.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update mteb/abstasks/AbsTask.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * update overview..py * make lint * update documentation --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * 1.36.12 Automatically generated by python-semantic-release * fix: Add `ModelMeta` license & custom validations (#2293) * license validation * move licenses * update imports --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * 1.36.13 Automatically generated by python-semantic-release * ci: Add pre-commit hook (#2194) * make dev life nicer - pre-commit hooks * add pre-commit to install * update precommit * update ruff pre-commit * lint * lint --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> * Update tasks table * fix: bug in voyage implementation (#2304) * fix: Fix bug in voyage implementation "passage" is not a valid input for the voyage API. Remapped to "document". * Update mteb/models/voyage_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * 1.36.14 Automatically generated by python-semantic-release * fix: Update voyage name to include Org. (#2322) * 1.36.15 Automatically generated by python-semantic-release * Added VDR Model (#2290) * Added VDR Model * change custom wrapper to SentenceTransformer Wrapper * remove kwargs and add TODO for Image Modality * Update mteb/models/vdr_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix: Resolve conflicting dependencies (#2323) These errors where discovered when trying to install the package using `uv`. We have a problem with salesforce-lavis, which is not compatible with the current set of dependencies. * 1.36.16 Automatically generated by python-semantic-release * fix: remove SyntaxWarnings in py312 (#2325) * fix: Resolve conflicting dependencies These errors where discovered when trying to install the package using `uv`. We have a problem with salesforce-lavis, which is not compatible with the current set of dependencies. * fix: Remove syntax warnings occuring in python 3.12 ``` Python 3.12.0 (main, Oct 2 2023, 20:56:14) [Clang 16.0.3 ] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import mteb # no syntax warnings >>> ``` * 1.36.17 Automatically generated by python-semantic-release * fix: add annotation models for stella zh (#2277) * fix: add annotation models for stella zh Additionally fixed a few annotation errors * format * Update mteb/models/stella_models.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * 1.36.18 Automatically generated by python-semantic-release * fix: Add ModelMeta rubert-mini-frida, BERTA (#2330) * Add rubert-mini-frida model meta * Add BERTA model meta * docs: fix typos * 1.36.19 Automatically generated by python-semantic-release * fix: Add WebFAQ bitext mining tasks (#2326) * Add WebFAQ bitext mining tasks Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Lower number of language pairs in WebFAQBitextMining Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> --------- Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Update tasks table * 1.36.20 Automatically generated by python-semantic-release * make lint * fix validation for license * fix remaining validation errors --------- Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: github-actions <github-actions@github.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Mina Parham <36207068+mina-parham@users.noreply.github.com> Co-authored-by: Mina Parham <minaparham@Keatext.local> Co-authored-by: Mehrzad Shahin-Moghadam <42153677+mehrzadshm@users.noreply.github.com> Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> Co-authored-by: Sam <40773225+sam-hey@users.noreply.github.com> Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> Co-authored-by: Shikhar Shiromani <rbk.shikhar@gmail.com> Co-authored-by: Shikhar Shiromani <sshiromani@sshiromani-mlt.client.nvidia.com> Co-authored-by: Ruslan Bel'kov <ruslan.belckov@yandex.ru> Co-authored-by: Márton Kardos <power.up1163@gmail.com> Co-authored-by: sufen-f <sufenfong@gmail.com> Co-authored-by: sufen <sufenf@gmail.com> Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> Co-authored-by: Samuel Yang <samuelyang150@gmail.com> Co-authored-by: Aradhye Agarwal <aradhyeagarwal@gmail.com> Co-authored-by: Tom Aarsen <37621491+tomaarsen@users.noreply.github.com> Co-authored-by: talshef <tsheffer@gmail.com> Co-authored-by: Tal Sheffer <tal.s@codium.ai> Co-authored-by: garciasces <garciasces@madrid.es> Co-authored-by: gowitheflow-1998 <jsbs54@durham.ac.uk> Co-authored-by: Wang Bo <bo.wang@jina.ai> Co-authored-by: Munot Ayush Sunil <munotayush6@kgpian.iitkgp.ac.in> Co-authored-by: Yaya Sy <58347382+yaya-sy@users.noreply.github.com> Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com> Co-authored-by: Eng. Omar Najar <79968243+omarnj-lab@users.noreply.github.com> Co-authored-by: Michael Dinzinger <39766249+michaeldinzinger@users.noreply.github.com> Co-authored-by: Jinhyuk Lee <lee.jnhk@gmail.com> Co-authored-by: Isaac Chung <isaac@hn496lf4f9.lan> Co-authored-by: sergeyz-zh <49659999+sergeyz-zh@users.noreply.github.com> * adding GTZAN Genre dataset (#2307) * add GTZANGenre dataset * reupload gtzan dataset * update TaskMetadata from mteb:maeb * make pr * add task subtype * update date for gtzan dataset * update ruff to 0.9.7; make lint * Adding Beijing Opera dataset (#2356) * update TaskMetadata from mteb:maeb * make pr * add task subtype * update ruff to 0.9.7; make lint * add voxlingua107-top10 dataset * add beijing opera dataset * update TaskMetadata from mteb:maeb * make pr * update ruff to 0.9.7; make lint * update TaskMetadata from mteb:maeb * update TaskMetadata * add Mridingham datasets * rm comment * Adding Libricount dataset (#2361) * update taskmetadata * sync mteb:maeb * make lint * Adding Crema-D Dataset for emotion classification [HEAR] (#2368) * update TaskMetadata from mteb:maeb * make pr * add task subtype * update ruff to 0.9.7; make lint * add voxlingua107-top10 dataset * update TaskMetadata * adding CREMA-D dataset * rm deleted files * Adding FSDD dataset (Free Spoken Digit Dataset) (#2371) update Taskmetadata * Add VoxCelebSA, SpokenQAforIC, VehicleSoundClustering from Dynamic-SUPERB (#2379) * add datasets from dynamic-superb * make lint * remove label mapping * apply lint * change eval_split train -> test * fix FSD-50K Task Metadata, Label handling and add stratified subsampling (#2369) fix FSD-50K task * Add music clustering dataset (#2232) * Adds music-genre dataset * Updates revision * Fixes issues * Changes category * Removes a2t from task category * Update the revision * Fixes based on the feedback about converting rate * Remove librosa Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * [MAEB] merge main -> maeb (#2471) * misc: Add image classification descriptive stats implementation (#2045) * add ImageClassificationDescriptiveStatistics * add MNIST descriptive stats * use tuples instead * add label count and update docstrings * update MNIST example * Update tasks table * fix: Add column descriptions to leaderboard (#2039) * fix: Add column descriptions to leaderboard * removed existing overlap * fix: Add BRIGHT (long) and fix bug in TaskResult.filter_and_validate() (#2041) * fix: Add BRIGHT Long Fixes #1978 * fix: Add BRIGHT(long) * fix bug in task results * updated bright * updated tests for TaskResults * 1.34.12 Automatically generated by python-semantic-release * misc: Add image clustering descriptive stats implementation (#2057) * add image clustering descirptive stats and run * finish off last one * remove script * fix: Update embed_dim for jina models (#2058) see https://github.com/embeddings-benchmark/results/pull/117 * Update tasks table * 1.34.13 Automatically generated by python-semantic-release * Add giga embeddings (#1741) * add gigaembeddings * use jasper * fix name * create sentence_transformer instruct wrapper * apply instruction template * fix jasper * update meta * misc: Add ZS and multilabel image classification descriptive stats implementation (#2059) * add image clustering descirptive stats and run * finish off last one * remove script * add ImageMultilabelClassificationDescriptiveStatistics * add VOC2007 * add zeroshot and mnist example * Update tasks table * Rename MIEB task classes with duplicated names (#2061) fix class names * misc: Add VisualSTS descriptive stats (#2062) * add visualsts stats * add last dataset * Update tasks table * fix: Added gte models (#1539) * fix: Added gte models * fix: Add mixbai models (#1540) for #1515 * fix: Add climate fever v2 (#1873) * Updated ClimateFEVER dataset with new version * Adds Fill in the empty metadata. * Updates the date tuple * Update class name Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update domains Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update task_subtypes * Update annotations_creators for the first version * Update date Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update task subtypes * Update path * Update description --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Mina Parham <minaparham@Keatext.local> * Update tasks table * fix: Updating paper scripts (#1958) * change reference revisions to align with paper * Update author list * Added code for main results table * updated minor changes * added external as a "no_revision_available" case * revert unintended changes * format * 1.34.14 Automatically generated by python-semantic-release * Add datasets for a benchmark newly introduced for "Engineering" domain (#1911) * adding clustering tasks (built-bench-clustering S2S & P2P) * updated built-bench-clustering tasks * Updated BuiltBenchClustering tasks * Added "Engineering" as new domain to TaskMetadata.py * Updated tasks table in docs * Updated task metadata for BuiltBenchClustering S2S and P2P * updated metadata for clustering tasks * Add/update BuiltBench tasks - Add BuiltBenchRetrieval task - Add BuiltBenchReranking task - Update metadata for BuiltBenchClusterinP2P - Update metadata for BuiltBenchClusterinS2S * update BuiltBench benchmark * Update mteb/benchmarks/benchmarks.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Clustering/eng/BuiltBenchClusteringS2S.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/tasks/Clustering/eng/BuiltBenchClusteringP2P.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/benchmarks/benchmarks.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Fix formatting via ruff --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Update tasks table * misc: update model names to adjust for adding to results repo (#2074) * update model names to adjust for adding to results repo * update model meta script * misc: Add all image classification descriptive stats (#2073) * add most image classification descr stats * revert changes to encoder * add stats --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> * Update tasks table * ci: Rerun tests that fail due to networking issues. (#2029) * fix: rerun tests that fail - Networking * update tests to use tmp_path * set versions for dev dependencies * add pytest options to pyproject.toml * add rerun json.decoder.JSONDecodeError * remove JSONDecodeError from pyproject.toml * add huggingface_hub.errors.HfHubHTTPError * add huggingface_hub.errors.LocalEntryNotFoundError https://github.com/embeddings-benchmark/mteb/actions/runs/13298535701/job/37139767443?pr=2044 * FileNotFoundError https://github.com/embeddings-benchmark/mteb/actions/runs/13302915091/job/37147507251?pr=2029 * add doc to pytest rerun --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> * fix: generate metadata (#2063) * fix: generate metadata * use logging not print for script * lint * add iso639 to dev pyproject * fix import * add memory_usage_mb * set version for iso639 Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * 1.34.15 Automatically generated by python-semantic-release * fix: add missing `e5` training datasets (#2065) add missing training datasets * 1.34.16 Automatically generated by python-semantic-release * fix: Ensure voyage model uses different naming scheme (#2083) * fix: Added make command for running leaderboard locally * fix: Ensure voyage models doesn't re-use the name * 1.34.17 Automatically generated by python-semantic-release * fix: Freeze model/rank columns in leaderboard (#2044) * fix: freeze model/rank columns in leaderboard * freezing zero-shot column * update min gradio version to 5.16.0 in pyproject.toml --------- Co-authored-by: Shikhar Shiromani <sshiromani@sshiromani-mlt.client.nvidia.com> * 1.34.18 Automatically generated by python-semantic-release * fix: Fixed previous incorrect specification of splits for CMTEB ( MTEB(cmn, v1) ) (#2086) Fixes #2064 * 1.34.19 Automatically generated by python-semantic-release * Remove duplicated string in docstring of TaskMetadata class (#2087) * Remove duplicated string in docstring of TaskMetadata class * Remove duplicated dataset field * fix: Smarter leaderboard caching with cachetools (#2085) * Added smarter caching to callbacks * Added cachetools as a dependency * Ran linting * Removed debugging print statement * Bumped Gradio version * Dependency fixes * Dependency fixes --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * fix: Missing fixes for #2086 - change MultilingualSentiment split from test to validation in CMTEB (#2088) * fix: Fixed previous incorrect specification of splits for CMTEB ( MTEB(cmn, v1) ) Fixes #2064 * change MultilingualSentiment split from test to validation in CMTEB * 1.34.20 Automatically generated by python-semantic-release * merge gme models (#2089) * fix: Add back task filtering by modalities (#2080) * add back task filtering by modalities * add unit test * check if task modalities is a subset of model modalities and fix tests * add model_modalities_more_than_task_modalities case * 1.34.21 Automatically generated by python-semantic-release * Added gtr-t5-base/large/xl/xxl metadata to mteb (#2092) * Added GTR Models to codebase * Linted gtr models file. * Added gtr-base/large/xl/xxl to sentence_transformers_models.py * Added memory_usage_mb and training_datasets * Reformatted training dataset names * Reformatted training dataset names * Reformatted training dataset names --------- Co-authored-by: sufen <sufenf@gmail.com> * misc: Add Any2TextMutipleChoice Descriptive Statistics (#2095) * add Any2TextMutipleChoiceDescriptiveStatistics * run on all tasks * Update tasks table * fix: Updated model annotations for GTE, e5, gritlm, and SFR models (#2101) Reported with references to paper + qoutes. * fix: Update links (#2098) * Fix link * Fix link * 1.34.22 Automatically generated by python-semantic-release * Add model inf-retriever-v1-1.5b (#2106) Add inf-retriever-v1-1.5b model * docs: Fix typos & refine text (#2102) * Update app.py * Fix typos * misc: Run Zeroshot Classification Descriptive Stats (#2105) * add most datasets * add birdsnap and imgnet1k * add scimmir and sun397 * add uck101 zs * Update tasks table * fix: add warning about task category conversion (#2108) add warning about task category conversion * 1.34.23 Automatically generated by python-semantic-release * fix: Add codesage-large-v2 (#2090) * Add codesage-large-v2 * Address comments * Add training dataset * Fix issues * Format code * Remove unnecessary wrapper * 1.34.24 Automatically generated by python-semantic-release * fix: add training data to BGE-m3-custom-fr (#2110) This ensure that is it correctly filtered as non-zero-shot * 1.34.25 Automatically generated by python-semantic-release * fix: Upgrade ruff to be gradio compatible (#2111) * fix: update ruff to be gradio compatible (>=0.9.3) * format * fix: upgrade ruff to latests (same as gradio compatible) * 1.34.26 Automatically generated by python-semantic-release * docs: Follow google docstring format (#2115) Fixes #2113 * Update leaderboard_refresh.yaml (#2121) * fix InstructSentenceTransformer Model name (#2125) fix params * fix voyage (#2127) * fix: update e5 instruct training data (#2129) update e5 training data * 1.34.27 Automatically generated by python-semantic-release * format * Update tasks table * fix: Add 2 new Static Sentence Transformer models (#2112) * Add 2 new Static Sentence Transformer models * Add Tatoeba Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * 1.34.28 Automatically generated by python-semantic-release * add is_cross_encoder (#1869) * add is_cross_encoder * Update mteb/model_meta.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * change value --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Qodo embed 1 1.5 b (#2137) * feat: Add Qodo-Embed-1-1.5B model metadata * fix: Add Qodo models to overview imports * fix: Add adapted_from field to Qodo model metadata * Update mteb/models/qodo_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * relint --------- Co-authored-by: Tal Sheffer <tal.s@codium.ai> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * misc: merge summary retrieval into bitext mining (#2140) merge summary retrieval into bitext mining * test: fix dataset availability test (#2141) This simplified the test and also make it a lot simpler. It also removed about 100 test cases which where all to the same API call. * fix: Update NVIDIA-Embed training data (#2143) Added a few missing annotations for nvidia-embed * 1.34.29 Automatically generated by python-semantic-release * fix: Add annotations for Voyage exp (#2144) * fix: Update NVIDIA-Embed training data Added a few missing annotations for nvidia-embed * fix update annotationf for voyage exp * 1.34.30 Automatically generated by python-semantic-release * Fix tokens num in cde models (#2148) fix tokens * feat: Add Qodo-Embed-1-7B model metadata and rename existing model (#2146) * feat: Add Qodo-Embed-1-7B model metadata and rename existing model * lint * fix revision * update license name --------- Co-authored-by: Tal Sheffer <tal.s@codium.ai> * 1.35.0 Automatically generated by python-semantic-release * misc: add Any2AnyRetrievalDescriptiveStatistics (#2139) add Any2AnyRetrievalDescriptiveStatistics * Update tasks table * Added zero-shot percentages and different filtering scheme (#2153) * Added zero-shot percentages and different filtering scheme * Update mteb/model_meta.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix: Incorrect annotations for Mistral-based embedding models (#2157) Fixes #2155 * 1.35.1 Automatically generated by python-semantic-release * Update FaMTEBRetrieval.py (#2171) The URL pointed to the settings page instead of the main repo URL. Now it is fixed. * Update tasks table * fix: Add Training data annotations (#2173) * redo to voyage to only training data * Add training data annotation for Kalm embeddings #2168 * Add correct training data annotations to Stella #2164 * removed fiqa PL as it does not exist * remove ArxivClusteringS2S.v2 as it does not exist * Add training data annotation for GIST embedding #2166 * fix max tokens for kalm models #2162 * remove eli 5 * 1.35.2 Automatically generated by python-semantic-release * feat: Add MIEB and MIEB-lite as benchmarks (#2035) * add mieb and mieb-lite to benchmarks * add CompositionalityEvaluation and DocumentUnderstanding types * add VisionCentric type * add missing comma * split STS17MultilingualVisualSTS and STSBenchmarkMultilingualSTS to eng and non-eng * use aggregate task instead so we can name the subsets * shorten names * fix import * alternative strategy to avoid using get_task * follow other aggregate tasks and skip metadata test * run LB without errors when selecting MIEB(-lite) * add back the capability as taks type * typo * extend description * split into mieb(eng) and mieb(multilingual) * remove unneeded files * remove aggtask additions for test * edit descriptions based on screenshots * shorten * rename to Compositionality and include ImageCoDeT2IMultiChoice * re-tag missing VisionCentric tasks * re-tag rparis and roxford as retrieval and include fixes * re-tag voc2007 as image cls * make lint * correct num task types in descriptions * add one model to models_to_annotate * add mieb reference models * update task types * relabel to multilingual retrieval task type to align with paper * fix reference and bibtex * edit task list to match with final list * add back agg task to reproduce table column in paper * fix filtering and import * update tests * mieb lite add back missing tasks * fix metadata test * multi should have all 4 variants * fix task counts * lite has 10 task types * fix visualSTS-17 lang splits * Aggregate task can now use subsets & eval langs to filter TaskResults * fix test and mark VisualSTS17 as multilingual * fix tests * add agg task running script * add voyage meta * fix citations * capitalize * add coarse/fine labels --------- Co-authored-by: gowitheflow-1998 <jsbs54@durham.ac.uk> * Update tasks table * 1.36.0 Automatically generated by python-semantic-release * fix: update training datasets and revision for jina models (#2179) * feat: update training datasets and revision for jina models * feat: update training datasets and revision for jina models * fix: Add more training data annotations (#2178) * redo to voyage to only training data * Add training data annotation for Kalm embeddings #2168 * Add correct training data annotations to Stella #2164 * removed fiqa PL as it does not exist * remove ArxivClusteringS2S.v2 as it does not exist * Add training data annotation for GIST embedding #2166 * fix max tokens for kalm models #2162 * remove eli 5 * fix: add training data for Bilingual Embeddings fixes #2167 * 1.36.1 Automatically generated by python-semantic-release * Added training data annotation for e5-base-4k (#2186) * fix: Added training data annotations to MXBAI (#2185) * fix: Update MTEB(Scandinavian) to use new DanFEVER (#2180) This also resolves the missing data in the leaderboard. Fixes #2172 * fix: Added training data annotation for MMLW models (#2188) * Added training data annotation for MMLW models * Added GIST annotations Kenneth missed * Added Stella en 400m training data' * 1.36.2 Automatically generated by python-semantic-release * fix: Added training data for sentence-croissant (#2189) * 1.36.3 Automatically generated by python-semantic-release * fix: update ru models annotation (#2181) * 1.36.4 Automatically generated by python-semantic-release * fix: Alphabetical ordering of tasks in dropdowns (#2191) * 1.36.5 Automatically generated by python-semantic-release * misc: Speed up qrel creation in any2anyretrieval (#2196) * use numpy vectorized operations instead of row-by-row * scores are int * use 'mteb.MTEB' instead of 'MTEB' for custom model (#2199) * add base models for e5 (#2183) * add similar datasets (#2205) * add similar datasets * add nano * update is filled * Update mteb/abstasks/TaskMetadata.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * add labse annotation (#2182) * add labse annotation * Update mteb/models/sentence_transformers_models.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * fix: Fixed leaderboard crash (#2221) * Fixed leaderboard crash * Fixed language selection error * Ran linting * 1.36.6 Automatically generated by python-semantic-release * fix: More training data annotations (#2220) * Added training data annotation for bge-gemma * Added missing annotations for Voyage models * Added training data for sts-multilingual-mpnet * Added all mteb datasets to STS-multilingual training data * 1.36.7 Automatically generated by python-semantic-release * Add LLM2CLIP (OpenAI variants) (#2222) * model loading and get_text_embeddings * add image_emb, fused_emb, and calc probs methods * add b16 model * add llm2clip_openai_l_14_224 (not working yet) * got llm2clip_openai_l_14_224 working * make lint * add training sets and allow py files * Change `dataset on HF` test to use official api (#2213) * refactor dataset checking * increase timeout * increase timeout * remove timeout * Descriptive stats functions for Any2AnyMC and ImageTextPC (#2197) * Add Any2AnyMC descriptive stats * Add descriptive stats function for ImageTextPC * add descriptive stats examples * linter * update multi choice descriptive stats * Update tasks table * fix: Add training data annotations to uderver-bloom models (#2210) * fix: Add training data annotations to uderver-bloom models fixes #2193 * fix: add mixedbread --------- Co-authored-by: Márton Kardos <power.up1163@gmail.com> * 1.36.8 Automatically generated by python-semantic-release * Add comment to `voyage-3-m-exp` model (#2229) * remove model size from voyage-3-m-exp model * Update mteb/models/voyage_models.py * Update mteb/models/voyage_models.py * docs: Update description of EURLex (#2231) * Automatically add similar tasks to training_tasks (#2228) * refactor dataset checking * increase timeout * increase timeout * remove timeout * start * automatically find datasets * update comment * fix aggregate task metadata * fixes * lint * rename * update fetch check * Remove overlapping legends from radar chart (#2195) * Remove overlapping legends from radar chart * ensure graph is not blocked * Overlapping legend issue of Radar Chart * misc: Run Any2AnyRetrieval descriptive stats (#2223) * run a few datasets * add a few more * run more tasks * add more datasets * remove pdb * remove newline * add more datasets * Update tasks table * misc: Add rest of the vision centric and compositionality descriptive stats (#2267) add the rest * Update tasks table * Fix `calculate_memory_usage_mb` in adding_a_model.md (#2271) * Add Arabic-Triplet-Matryoshka-V2 model metadata to MTEB (#2270) * Add Arabic-Triplet-Matryoshka-V2 model metadata to MTEB * Update memory_usage_mb with correct calculated value * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * remove comments * added correct memory usage * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Apply linter fixes with ruff * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/models/Arabic_Triplet_Matryoshka_V2.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Add Arabic_Triplet_Matryoshka_V2 to overview.py * Rename model file to ara_models.py and update imports --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix: Add WebFAQ Retrieval dataset (#2236) * Add WebFAQ Retrieval dataset Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Small change WebFAQRetrieval.py Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Add remaining languages to WebFAQ Retrieval task Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Add descriptive stats Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> --------- Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de> * Update tasks table * 1.36.9 Automatically generated by python-semantic-release * fix: Formatting issue in Performance Plot (#2237) * Formatting issue in Performance Plot * make lint * added function for better code readability * 1.36.10 Automatically generated by python-semantic-release * ci: run test_dataset_on_hf separately (#2201) * dont run test_dataset_on_hf in every pr * lint * Update call pytest test_datasets Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.co…
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
for #1515
Checklist
make test.make lint.