embeddings-benchmark · KennethEnevoldsen · May 19, 2025 · May 18, 2025 · May 19, 2025
diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md
@@ -1,46 +1,7 @@
 
-<!-- If you are submitting a dataset or a model for the model registry please use the corresponding checklists below otherwise feel free to remove them. -->
 
-<!-- add additional description, question etc. related to the new dataset -->
+### Checklist
+<!-- please do not delete this checklist -->
 
-
-### Code Quality
-<!-- Please do not delete this -->
-- [ ] **Code Formatted**: Format the code using `make lint` to maintain consistent style.
-
-### Documentation
-<!-- Please do not delete this -->
-- [ ] **Updated Documentation**: Add or update documentation to reflect the changes introduced in this PR.
-
-### Testing
-<!-- Please do not delete this -->
-- [ ] **New Tests Added**: Write tests to cover new functionality. Validate with `make test-with-coverage`.
-- [ ] **Tests Passed**: Run tests locally using `make test` or `make test-with-coverage` to ensure no existing functionality is broken.
-
-
-### Adding datasets checklist
-<!-- see also https://github.com/embeddings-benchmark/mteb/blob/main/docs/adding_a_dataset.md -->
-
-**Reason for dataset addition**: ... <!-- Add reason for adding dataset here. E.g. it covers task/language/domain previously not covered -->
-
-- [ ] I have run the following models on the task (adding the results to the pr). These can be run using the `mteb -m {model_name} -t {task_name}` command.
-  - [ ] `sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2`
-  - [ ] `intfloat/multilingual-e5-small`
-- [ ] I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
-- [ ] If the dataset is too big (e.g. >2048 examples), considering using `self.stratified_subsampling() under dataset_transform()`
-- [ ] I have filled out the metadata object in the dataset file (find documentation on it [here](https://github.com/embeddings-benchmark/mteb/blob/main/docs/adding_a_dataset.md#2-creating-the-metadata-object)).
-- [ ] Run tests locally to make sure nothing is broken using `make test`.
-- [ ] Run the formatter to format the code using `make lint`.
-
-
-### Adding a model checklist
-<!--
-When adding a model to the model registry
-see also https://github.com/embeddings-benchmark/mteb/blob/main/docs/reproducible_workflow.md
--->
-
- - [ ] I have filled out the ModelMeta object to the extent possible
- - [ ] I have ensured that my model can be loaded using
-   - [ ] `mteb.get_model(model_name, revision)` and
-   - [ ] `mteb.get_model_meta(model_name, revision)`
- - [ ] I have tested the implementation works on a representative set of tasks.
+- [ ] I did not add a dataset, or if I did, I added the [dataset checklist](https://github.com/embeddings-benchmark/mteb/blob/main/docs/adding_a_dataset.md#submit-a-pr) to the PR and completed it.
+- [ ] I did not add a model, or if I did, I added the [model checklist](https://github.com/embeddings-benchmark/mteb/blob/main/docs/adding_a_model.md#submitting-your-model-as-a-pr) to the PR and completed it.
diff --git a/docs/adding_a_dataset.md b/docs/adding_a_dataset.md
@@ -236,8 +236,9 @@ Once you are finished create a PR to the [MTEB](https://github.com/embeddings-be
 The PR will be reviewed by one of the organizers or contributors who might ask you to change things. Once the PR is approved the dataset will be added into the main repository.
 
 
-Before you commit here is a checklist you should consider completing before submitting:
+Before you commit, here is a checklist you should complete before submitting:
 
+- [ ] I have outlined why this dataset is filling an existing gap in `mteb`
 - [ ] I have tested that the dataset runs with the `mteb` package.
 
 An easy way to test it is using:
@@ -257,5 +258,3 @@ evaluation = MTEB(tasks=[YourNewTask()])
   - [ ] `intfloat/multilingual-e5-small`
 - [ ] I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
 - [ ] I have considered the size of the dataset and reduced it if it is too big (2048 examples is typically large enough for most tasks)
-- [ ] Run tests locally to make sure nothing is broken using `make test`.
-- [ ] Run the formatter to format the code using `make lint`.
diff --git a/docs/adding_a_model.md b/docs/adding_a_model.md
@@ -1,112 +1,64 @@
-## Adding a Model to the MTEB Leaderboard
+## Adding a model to the Leaderboard
 
 The MTEB Leaderboard is available [here](https://huggingface.co/spaces/mteb/leaderboard). To submit to it:
 
-1. **Add meta information about your model to [model dir](../mteb/models/)**. See the docstring of ModelMeta for meta data details.
-   ```python
-   from mteb.model_meta import ModelMeta
-
-   bge_m3 = ModelMeta(
-       name="model_name",
-       languages=["model_languages"], # in format eng-Latn
-       open_weights=True,
-       revision="5617a9f61b028005a4858fdac845db406aefb181",
-       release_date="2024-06-28",
-       n_parameters=568_000_000,
-       memory_usage_mb=2167,
-       embed_dim=4096,
-       license="mit",
-       max_tokens=8194,
-       reference="https://huggingface.co/BAAI/bge-m3",
-       similarity_fn_name="cosine",
-       framework=["Sentence Transformers", "PyTorch"],
-       use_instructions=False,
-       public_training_code=None,
-       public_training_data="https://huggingface.co/datasets/cfli/bge-full-data",
-       training_datasets={"your_dataset": ["train"]},
-   )
-   ```
-   To calculate `memory_usage_mb` you can run `model_meta.calculate_memory_usage_mb()`. By default, the model will run using the [`sentence_transformers_loader`](../mteb/models/sentence_transformer_wrapper.py) loader function. If you need to use a custom implementation, you can specify the `loader` parameter in the `ModelMeta` class. For example:
-   ```python
-   from mteb.models.wrapper import Wrapper
-   from mteb.encoder_interface import PromptType
-   import numpy as np
-
-   class CustomWrapper(Wrapper):
-       def __init__(self, model_name, model_revision):
-           super().__init__(model_name, model_revision)
-           # your custom implementation here
-
-       def encode(
-            self,
-            sentences: list[str],
-            *,
-            task_name: str,
-            prompt_type: PromptType | None = None,
-            **kwargs
-       ) -> np.ndarray:
-           # your custom implementation here
-           return np.zeros((len(sentences), self.embed_dim))
-   ```
-   Then you can specify the `loader` parameter in the `ModelMeta` class:
-   ```python
-   your_model = ModelMeta(
-       loader=partial(
-            CustomWrapper,
-            model_name="model_name",
-            model_revision="5617a9f61b028005a4858fdac845db406aefb181"
-       ),
-       ...
-   )
-   ```
-2. **Run the desired model on MTEB:**
-
-Either use the Python API:
+1. Add the [model meta](https://github.com/embeddings-benchmark/mteb/blob/main/docs/adding_a_model.md#adding-a-model-implementation) to `mteb`
+2. [Evaluate](https://github.com/embeddings-benchmark/mteb/blob/main/docs/usage/usage.md#evaluating-a-model) the desired model using `mteb` on the [desired benchmarks](https://github.com/embeddings-benchmark/mteb/blob/main/docs/usage/usage.md#selecting-a-benchmark)
+3. Push the results to the [results repository](https://github.com/embeddings-benchmark/results) via a PR. Once merged they will appear on the leaderboard after a day.
 
-```python
-import mteb
-
-# load a model from the hub (or for a custom implementation see https://github.com/embeddings-benchmark/mteb/blob/main/docs/reproducible_workflow.md)
-model = mteb.get_model("sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2")
 
-tasks = mteb.get_tasks(...) # get specific tasks
-# or
-tasks = mteb.get_benchmark("MTEB(eng, classic)") # or use a specific benchmark
-
-evaluation = mteb.MTEB(tasks=tasks)
-evaluation.run(model, output_folder="results")
-```
+## Adding a model implementation
 
-Or using the command line interface:
+Adding a model implementation to `mteb` is quite straightforward.
+Typically it only requires that you fill in metadata about the model and add it to the [model directory](../mteb/models/):
 
-```bash
-mteb run -m {model_name} -t {task_names}
+```python
+from mteb.model_meta import ModelMeta
+
+my_model = ModelMeta(
+    name="model_name",
+    languages=["eng-Latn"], # follows ISO 639-3 and BCP-47
+    open_weights=True,
+    revision="5617a9f61b028005a4858fdac845db406aefb181",
+    release_date="2025-01-01",
+    n_parameters=568_000_000,
+    memory_usage_mb=2167,
+    embed_dim=4096,
+    license="mit",
+    max_tokens=8194,
+    reference="https://huggingface.co/user-or-org/model-name",
+    similarity_fn_name="cosine",
+    framework=["Sentence Transformers", "PyTorch"],
+    use_instructions=False,
+    public_training_code="https://github.com/user-or-org/my-training-code",
+    public_training_data="https://huggingface.co/datasets/user-or-org/full-dataset",
+    training_datasets={"MSMARCO": ["train"]}, # if you trained on the MSMARCO training set
+)
 ```
 
-These will save the results in a folder called `results/{model_name}/{model_revision}`.
-
-2. **Push Results to the Leaderboard**
-
-To add results to the public leaderboard you can push your results to the [results repository](https://github.com/embeddings-benchmark/results) via a PR. Once merged they will appear on the leaderboard after a day.
+This works for all [Sentence Transformers](https://sbert.net) compatible models. Once filled out, you can submit your model to `mteb` by
+submitting a PR.
 
-3. **Wait for a refresh the leaderboard**
 
-**Notes:**
+### Calculating the Memory Usage
 
-##### Using Prompts with Sentence Transformers
+To calculate `memory_usage_mb`, run:
 
-If your model uses Sentence Transformers and requires different prompts for encoding the queries and corpus, you can take advantage of the `prompts` [parameter](https://sbert.net/docs/package_reference/sentence_transformer/SentenceTransformer.html#sentence_transformers.SentenceTransformer).
-
-Internally, `mteb` uses `query` for encoding the queries and `passage` as the prompt names for encoding the corpus. This is aligned with the default names used by Sentence Transformers.
+```py
+model_meta = mteb.get_model_meta("model_name")
+model_meta.calculate_memory_usage_mb()
+```
 
-###### Adding the prompts in the model configuration (Preferred)
+### Adding instruction models
 
-You can directly add the prompts when saving and uploading your model to the Hub. For an example, refer to this [configuration file](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v1.5/blob/3b5a16eaf17e47bd997da998988dce5877a57092/config_sentence_transformers.json). These prompts can then be specified in the ModelMeta object.
+Some models, such as the [E5 models](https://huggingface.co/intfloat/multilingual-e5-large-instruct), use instructions or prompts.
+You can directly add the prompts when saving and uploading your model to the Hub. Refer to this [configuration file as an example](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v1.5/blob/3b5a16eaf17e47bd997da998988dce5877a57092/config_sentence_transformers.json). 
 
+However, you can also add these directly to the model configuration:
 
 ```python
 model = ModelMeta(
-    loader=partial(  # type: ignore
+    loader=partial(
         sentence_transformers_loader,
         model_name="intfloat/multilingual-e5-small",
         revision="fd1525a9fd15316a2d503bf26ab031a61d056e98",
@@ -117,34 +69,77 @@ model = ModelMeta(
     ),
 )
 ```
-If you are unable to directly add the prompts in the model configuration, you can instantiate the model using the `sentence_transformers_loader` and pass `prompts` as an argument. For more details, see the `mteb/models/bge_models.py` file.
 
-##### Adding instruction models
+### Using a custom Implementation
 
-Models that use instructions can use the [`InstructSentenceTransformerWrapper`](../mteb/models/instruct_wrapper.py). For example:
+If you need to use a custom implementation, you can specify the `loader` parameter in the `ModelMeta` class. For example:
 ```python
-model = ModelMeta(
+from mteb.models.wrapper import Wrapper
+from mteb.encoder_interface import PromptType
+import numpy as np
+
+class CustomWrapper(Wrapper):
+    def __init__(self, model_name, model_revision):
+        super().__init__(model_name, model_revision)
+        # your custom implementation here
+
+    def encode(
+        self,
+        sentences: list[str],
+        *,
+        task_name: str,
+        prompt_type: PromptType | None = None,
+        **kwargs
+    ) -> np.ndarray:
+        # your custom implementation here
+        return np.zeros((len(sentences), self.embed_dim))
+```
+
+Then you can specify the `loader` parameter in the `ModelMeta` class:
+
+```python
+your_model = ModelMeta(
     loader=partial(
-        InstructSentenceTransformerWrapper,
-        model="nvidia/NV-Embed-v1",
-        revision="7604d305b621f14095a1aa23d351674c2859553a",
-        instruction_template="Instruct: {instruction}\nQuery: ",
+        CustomWrapper,
+        model_name="model_name",
+        model_revision="5617a9f61b028005a4858fdac845db406aefb181"
     ),
-   ...
+    ...
 )
 ```
 
-##### Adding model dependencies in pyproject.toml
-If your are adding a model that requires additional dependencies, you can add them to the `pyproject.toml` file and instead of checking whether dependencies are installed or not make use of `requires_package` from [requires_package.py](../mteb/requires_packages.py). For example:
+
+### Adding model dependencies
+If your are adding a model that requires additional dependencies, you can add them to the `pyproject.toml` file, under optional dependencies:
+
+```toml
+voyageai = ["voyageai>=1.0.0,<2.0.0"]
+```
+
+This ensure that the implementation does not break if a package is updated.
+
+As it is an optional dependency, you can't use top-level dependencies, but will instead have to use import inside the wrapper scope:
 
 In the [voyage_models.py](../mteb/models/voyage_models.py) file, we have added the following code:
 ```python
 from mteb.requires_package import requires_package
-requires_package(self, "voyageai", model_name, "pip install 'mteb[voyageai]'")
-```
-and also updated [pyproject.toml]((../pyproject.toml)) file with the following code:
-```python
-voyageai = ["voyageai>=1.0.0,<2.0.0"]
+
+class VoyageWrapper(Wrapper):
+    def __init__(...) -> None:
+        requires_package(self, "voyageai", model_name, "pip install 'mteb[voyageai]'")
+        import voyageai
+        ...
 ```
-so that it will check whether voyageai is installed or not. If not, then it will give an error message to install voyageai. This has done so as to give clear installation warnings. 
-If you want to give suggestion instead of warning, you can use `suggest_package` from [requires_package.py](../mteb/requires_packages.py).
+Here you will also see that we use  to ensure friendly error messages when package installations are required.
+If you want to give a suggestion instead of a warning, you can use [`suggest_package`](../mteb/requires_packages.py).
+
+### Submitting your model as a PR
+
+When submitting you models as a PR, please copy and paste the following checklist into pull request message:
+
+- [ ] I have filled out the ModelMeta object to the extent possible
+- [ ] I have ensured that my model can be loaded using
+  - [ ] `mteb.get_model(model_name, revision)` and
+  - [ ] `mteb.get_model_meta(model_name, revision)`
+- [ ] I have tested the implementation works on a representative set of tasks.
+- [ ] The model is public, i.e. is available either as an API or the wieght are publicly avaiable to download