Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 4 additions & 43 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -1,46 +1,7 @@

<!-- If you are submitting a dataset or a model for the model registry please use the corresponding checklists below otherwise feel free to remove them. -->

<!-- add additional description, question etc. related to the new dataset -->
### Checklist
<!-- please do not delete this checklist -->


### Code Quality
<!-- Please do not delete this -->
- [ ] **Code Formatted**: Format the code using `make lint` to maintain consistent style.

### Documentation
<!-- Please do not delete this -->
- [ ] **Updated Documentation**: Add or update documentation to reflect the changes introduced in this PR.

### Testing
<!-- Please do not delete this -->
- [ ] **New Tests Added**: Write tests to cover new functionality. Validate with `make test-with-coverage`.
- [ ] **Tests Passed**: Run tests locally using `make test` or `make test-with-coverage` to ensure no existing functionality is broken.


### Adding datasets checklist
<!-- see also https://github.com/embeddings-benchmark/mteb/blob/main/docs/adding_a_dataset.md -->

**Reason for dataset addition**: ... <!-- Add reason for adding dataset here. E.g. it covers task/language/domain previously not covered -->

- [ ] I have run the following models on the task (adding the results to the pr). These can be run using the `mteb -m {model_name} -t {task_name}` command.
- [ ] `sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2`
- [ ] `intfloat/multilingual-e5-small`
- [ ] I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
- [ ] If the dataset is too big (e.g. >2048 examples), considering using `self.stratified_subsampling() under dataset_transform()`
- [ ] I have filled out the metadata object in the dataset file (find documentation on it [here](https://github.com/embeddings-benchmark/mteb/blob/main/docs/adding_a_dataset.md#2-creating-the-metadata-object)).
- [ ] Run tests locally to make sure nothing is broken using `make test`.
- [ ] Run the formatter to format the code using `make lint`.


### Adding a model checklist
<!--
When adding a model to the model registry
see also https://github.com/embeddings-benchmark/mteb/blob/main/docs/reproducible_workflow.md
-->

- [ ] I have filled out the ModelMeta object to the extent possible
- [ ] I have ensured that my model can be loaded using
- [ ] `mteb.get_model(model_name, revision)` and
- [ ] `mteb.get_model_meta(model_name, revision)`
- [ ] I have tested the implementation works on a representative set of tasks.
- [ ] I did not add a dataset, or if I did, I added the [dataset checklist](https://github.com/embeddings-benchmark/mteb/blob/main/docs/adding_a_dataset.md#submit-a-pr) to the PR and completed it.
- [ ] I did not add a model, or if I did, I added the [model checklist](https://github.com/embeddings-benchmark/mteb/blob/main/docs/adding_a_model.md#submitting-your-model-as-a-pr) to the PR and completed it.
5 changes: 2 additions & 3 deletions docs/adding_a_dataset.md
Original file line number Diff line number Diff line change
Expand Up @@ -236,8 +236,9 @@ Once you are finished create a PR to the [MTEB](https://github.com/embeddings-be
The PR will be reviewed by one of the organizers or contributors who might ask you to change things. Once the PR is approved the dataset will be added into the main repository.


Before you commit here is a checklist you should consider completing before submitting:
Before you commit, here is a checklist you should complete before submitting:

- [ ] I have outlined why this dataset is filling an existing gap in `mteb`
- [ ] I have tested that the dataset runs with the `mteb` package.

An easy way to test it is using:
Expand All @@ -257,5 +258,3 @@ evaluation = MTEB(tasks=[YourNewTask()])
- [ ] `intfloat/multilingual-e5-small`
- [ ] I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
- [ ] I have considered the size of the dataset and reduced it if it is too big (2048 examples is typically large enough for most tasks)
- [ ] Run tests locally to make sure nothing is broken using `make test`.
- [ ] Run the formatter to format the code using `make lint`.
211 changes: 103 additions & 108 deletions docs/adding_a_model.md
Original file line number Diff line number Diff line change
@@ -1,112 +1,64 @@
## Adding a Model to the MTEB Leaderboard
## Adding a model to the Leaderboard

The MTEB Leaderboard is available [here](https://huggingface.co/spaces/mteb/leaderboard). To submit to it:

1. **Add meta information about your model to [model dir](../mteb/models/)**. See the docstring of ModelMeta for meta data details.
```python
from mteb.model_meta import ModelMeta

bge_m3 = ModelMeta(
name="model_name",
languages=["model_languages"], # in format eng-Latn
open_weights=True,
revision="5617a9f61b028005a4858fdac845db406aefb181",
release_date="2024-06-28",
n_parameters=568_000_000,
memory_usage_mb=2167,
embed_dim=4096,
license="mit",
max_tokens=8194,
reference="https://huggingface.co/BAAI/bge-m3",
similarity_fn_name="cosine",
framework=["Sentence Transformers", "PyTorch"],
use_instructions=False,
public_training_code=None,
public_training_data="https://huggingface.co/datasets/cfli/bge-full-data",
training_datasets={"your_dataset": ["train"]},
)
```
To calculate `memory_usage_mb` you can run `model_meta.calculate_memory_usage_mb()`. By default, the model will run using the [`sentence_transformers_loader`](../mteb/models/sentence_transformer_wrapper.py) loader function. If you need to use a custom implementation, you can specify the `loader` parameter in the `ModelMeta` class. For example:
```python
from mteb.models.wrapper import Wrapper
from mteb.encoder_interface import PromptType
import numpy as np

class CustomWrapper(Wrapper):
def __init__(self, model_name, model_revision):
super().__init__(model_name, model_revision)
# your custom implementation here

def encode(
self,
sentences: list[str],
*,
task_name: str,
prompt_type: PromptType | None = None,
**kwargs
) -> np.ndarray:
# your custom implementation here
return np.zeros((len(sentences), self.embed_dim))
```
Then you can specify the `loader` parameter in the `ModelMeta` class:
```python
your_model = ModelMeta(
loader=partial(
CustomWrapper,
model_name="model_name",
model_revision="5617a9f61b028005a4858fdac845db406aefb181"
),
...
)
```
2. **Run the desired model on MTEB:**

Either use the Python API:
1. Add the [model meta](https://github.com/embeddings-benchmark/mteb/blob/main/docs/adding_a_model.md#adding-a-model-implementation) to `mteb`
2. [Evaluate](https://github.com/embeddings-benchmark/mteb/blob/main/docs/usage/usage.md#evaluating-a-model) the desired model using `mteb` on the [desired benchmarks](https://github.com/embeddings-benchmark/mteb/blob/main/docs/usage/usage.md#selecting-a-benchmark)
3. Push the results to the [results repository](https://github.com/embeddings-benchmark/results) via a PR. Once merged they will appear on the leaderboard after a day.

```python
import mteb

# load a model from the hub (or for a custom implementation see https://github.com/embeddings-benchmark/mteb/blob/main/docs/reproducible_workflow.md)
model = mteb.get_model("sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2")

tasks = mteb.get_tasks(...) # get specific tasks
# or
tasks = mteb.get_benchmark("MTEB(eng, classic)") # or use a specific benchmark

evaluation = mteb.MTEB(tasks=tasks)
evaluation.run(model, output_folder="results")
```
## Adding a model implementation

Or using the command line interface:
Adding a model implementation to `mteb` is quite straightforward.
Typically it only requires that you fill in metadata about the model and add it to the [model directory](../mteb/models/):

```bash
mteb run -m {model_name} -t {task_names}
```python
from mteb.model_meta import ModelMeta

my_model = ModelMeta(
name="model_name",
languages=["eng-Latn"], # follows ISO 639-3 and BCP-47
open_weights=True,
revision="5617a9f61b028005a4858fdac845db406aefb181",
release_date="2025-01-01",
n_parameters=568_000_000,
memory_usage_mb=2167,
embed_dim=4096,
license="mit",
max_tokens=8194,
reference="https://huggingface.co/user-or-org/model-name",
similarity_fn_name="cosine",
framework=["Sentence Transformers", "PyTorch"],
use_instructions=False,
public_training_code="https://github.com/user-or-org/my-training-code",
public_training_data="https://huggingface.co/datasets/user-or-org/full-dataset",
training_datasets={"MSMARCO": ["train"]}, # if you trained on the MSMARCO training set
)
```

These will save the results in a folder called `results/{model_name}/{model_revision}`.

2. **Push Results to the Leaderboard**

To add results to the public leaderboard you can push your results to the [results repository](https://github.com/embeddings-benchmark/results) via a PR. Once merged they will appear on the leaderboard after a day.
This works for all [Sentence Transformers](https://sbert.net) compatible models. Once filled out, you can submit your model to `mteb` by
submitting a PR.

3. **Wait for a refresh the leaderboard**

**Notes:**
### Calculating the Memory Usage

##### Using Prompts with Sentence Transformers
To calculate `memory_usage_mb`, run:

If your model uses Sentence Transformers and requires different prompts for encoding the queries and corpus, you can take advantage of the `prompts` [parameter](https://sbert.net/docs/package_reference/sentence_transformer/SentenceTransformer.html#sentence_transformers.SentenceTransformer).

Internally, `mteb` uses `query` for encoding the queries and `passage` as the prompt names for encoding the corpus. This is aligned with the default names used by Sentence Transformers.
```py
model_meta = mteb.get_model_meta("model_name")
model_meta.calculate_memory_usage_mb()
```

###### Adding the prompts in the model configuration (Preferred)
### Adding instruction models

You can directly add the prompts when saving and uploading your model to the Hub. For an example, refer to this [configuration file](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v1.5/blob/3b5a16eaf17e47bd997da998988dce5877a57092/config_sentence_transformers.json). These prompts can then be specified in the ModelMeta object.
Some models, such as the [E5 models](https://huggingface.co/intfloat/multilingual-e5-large-instruct), use instructions or prompts.
You can directly add the prompts when saving and uploading your model to the Hub. Refer to this [configuration file as an example](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v1.5/blob/3b5a16eaf17e47bd997da998988dce5877a57092/config_sentence_transformers.json).

However, you can also add these directly to the model configuration:

```python
model = ModelMeta(
loader=partial( # type: ignore
loader=partial(
sentence_transformers_loader,
model_name="intfloat/multilingual-e5-small",
revision="fd1525a9fd15316a2d503bf26ab031a61d056e98",
Expand All @@ -117,34 +69,77 @@ model = ModelMeta(
),
)
```
If you are unable to directly add the prompts in the model configuration, you can instantiate the model using the `sentence_transformers_loader` and pass `prompts` as an argument. For more details, see the `mteb/models/bge_models.py` file.

##### Adding instruction models
### Using a custom Implementation

Models that use instructions can use the [`InstructSentenceTransformerWrapper`](../mteb/models/instruct_wrapper.py). For example:
If you need to use a custom implementation, you can specify the `loader` parameter in the `ModelMeta` class. For example:
```python
model = ModelMeta(
from mteb.models.wrapper import Wrapper
from mteb.encoder_interface import PromptType
import numpy as np

class CustomWrapper(Wrapper):
def __init__(self, model_name, model_revision):
super().__init__(model_name, model_revision)
# your custom implementation here

def encode(
self,
sentences: list[str],
*,
task_name: str,
prompt_type: PromptType | None = None,
**kwargs
) -> np.ndarray:
# your custom implementation here
return np.zeros((len(sentences), self.embed_dim))
```

Then you can specify the `loader` parameter in the `ModelMeta` class:

```python
your_model = ModelMeta(
loader=partial(
InstructSentenceTransformerWrapper,
model="nvidia/NV-Embed-v1",
revision="7604d305b621f14095a1aa23d351674c2859553a",
instruction_template="Instruct: {instruction}\nQuery: ",
CustomWrapper,
model_name="model_name",
model_revision="5617a9f61b028005a4858fdac845db406aefb181"
),
...
...
)
```

##### Adding model dependencies in pyproject.toml
If your are adding a model that requires additional dependencies, you can add them to the `pyproject.toml` file and instead of checking whether dependencies are installed or not make use of `requires_package` from [requires_package.py](../mteb/requires_packages.py). For example:

### Adding model dependencies
If your are adding a model that requires additional dependencies, you can add them to the `pyproject.toml` file, under optional dependencies:

```toml
voyageai = ["voyageai>=1.0.0,<2.0.0"]
```

This ensure that the implementation does not break if a package is updated.

As it is an optional dependency, you can't use top-level dependencies, but will instead have to use import inside the wrapper scope:

In the [voyage_models.py](../mteb/models/voyage_models.py) file, we have added the following code:
```python
from mteb.requires_package import requires_package
requires_package(self, "voyageai", model_name, "pip install 'mteb[voyageai]'")
```
and also updated [pyproject.toml]((../pyproject.toml)) file with the following code:
```python
voyageai = ["voyageai>=1.0.0,<2.0.0"]

class VoyageWrapper(Wrapper):
def __init__(...) -> None:
requires_package(self, "voyageai", model_name, "pip install 'mteb[voyageai]'")
import voyageai
...
```
so that it will check whether voyageai is installed or not. If not, then it will give an error message to install voyageai. This has done so as to give clear installation warnings.
If you want to give suggestion instead of warning, you can use `suggest_package` from [requires_package.py](../mteb/requires_packages.py).
Here you will also see that we use to ensure friendly error messages when package installations are required.
If you want to give a suggestion instead of a warning, you can use [`suggest_package`](../mteb/requires_packages.py).

### Submitting your model as a PR

When submitting you models as a PR, please copy and paste the following checklist into pull request message:

- [ ] I have filled out the ModelMeta object to the extent possible
- [ ] I have ensured that my model can be loaded using
- [ ] `mteb.get_model(model_name, revision)` and
- [ ] `mteb.get_model_meta(model_name, revision)`
- [ ] I have tested the implementation works on a representative set of tasks.
- [ ] The model is public, i.e. is available either as an API or the wieght are publicly avaiable to download