diff --git a/README.md b/README.md index daf715f029..59cc5da9e2 100644 --- a/README.md +++ b/README.md @@ -472,24 +472,24 @@ evaluation.run(model, ...) ## Documentation -| Documentation | | -| ------------------------------ | ---------------------- | -| πŸ“‹ [Tasks] |Β Overview of available tasks | -| πŸ“ [Benchmarks] | Overview of available benchmarks | -| πŸ“ˆ [Leaderboard] | The interactive leaderboard of the benchmark | -| πŸ€– [Adding a model] | Information related to how to submit a model to the leaderboard | +| Documentation | | +|--------------------------------|-------------------------------------------------------------------------------------| +| πŸ“‹ [Tasks] | Overview of available tasks | +| πŸ“ [Benchmarks] | Overview of available benchmarks | +| πŸ“ˆ [Leaderboard] | The interactive leaderboard of the benchmark | +| πŸ€– [Adding a model] | Information related to how to submit a model to MTEB and to the leaderboard | | πŸ‘©β€πŸ”¬ [Reproducible workflows] | Information related to how to reproduce and create reproducible workflows with MTEB | -| πŸ‘©β€πŸ’» [Adding a dataset] | How to add a new task/dataset to MTEB |Β  -| πŸ‘©β€πŸ’» [Adding a leaderboard tab] | How to add a new leaderboard tab to MTEB |Β  -| 🀝 [Contributing] | How to contribute to MTEB and set it up for development | -| 🌐 [MMTEB] | An open-source effort to extend MTEB to cover a broad set of languages | Β  +| πŸ‘©β€πŸ’» [Adding a dataset] | How to add a new task/dataset to MTEB | +| πŸ‘©β€πŸ’» [Adding a benchmark] | How to add a new benchmark to MTEB and to the leaderboard | +| 🀝 [Contributing] | How to contribute to MTEB and set it up for development | +| 🌐 [MMTEB] | An open-source effort to extend MTEB to cover a broad set of languages | [Tasks]: docs/tasks.md [Benchmarks]: docs/benchmarks.md [Contributing]: CONTRIBUTING.md [Adding a model]: docs/adding_a_model.md [Adding a dataset]: docs/adding_a_dataset.md -[Adding a leaderboard tab]: docs/adding_a_leaderboard_tab.md +[Adding a benchmark]: docs/adding_a_benchmark.md [Leaderboard]: https://huggingface.co/spaces/mteb/leaderboard [MMTEB]: docs/mmteb/readme.md [Reproducible workflows]: docs/reproducible_workflow.md diff --git a/docs/adding_a_benchmark.md b/docs/adding_a_benchmark.md new file mode 100644 index 0000000000..56a042fdb9 --- /dev/null +++ b/docs/adding_a_benchmark.md @@ -0,0 +1,7 @@ +## Adding a benchmark + +The MTEB Leaderboard is available [here](https://huggingface.co/spaces/mteb/leaderboard) and we encourage additions of new benchmarks. To add a new benchmark: + +1. Add your benchmark to [benchmark.py](../mteb/benchmarks/benchmarks.py) as a `Benchmark` object, and select the MTEB tasks that will be in the benchmark. If some of the tasks do not exist in MTEB, follow the "add a dataset" instructions to add them. +2. Open a PR at https://github.com/embedding-benchmark/results with results of models on your benchmark. +3. When PRs are merged, your benchmark will be added to the leaderboard automatically after the next workflow trigger. \ No newline at end of file diff --git a/docs/adding_a_leaderboard_tab.md b/docs/adding_a_leaderboard_tab.md deleted file mode 100644 index 260293ed5c..0000000000 --- a/docs/adding_a_leaderboard_tab.md +++ /dev/null @@ -1,15 +0,0 @@ -## Adding a new Leaderboard tab - -The MTEB Leaderboard is available [here](https://huggingface.co/spaces/mteb/leaderboard) and we love new leaderboard tabs. To add a new leaderboard tab: - -1. Open a PR in https://hf.co/datasets/mteb/results with: -- All results added in existing model folders or new folders -- Updated paths.json (see snippet results.py) -- If adding any new models, their names added to results.py -- If you have access to all models you are adding, you can also [add results via the metadata](https://github.com/embeddings-benchmark/mteb/blob/main/docs/adding_a_model.md) for all of them / some of them -2. Open a PR at https://huggingface.co/spaces/mteb/leaderboard modifying app.py to add your tab: -- Add any new models & their specs to the global lists -- Add your tab, credits etc to where the other tabs are defined -- If you're adding new results to existing models, remove those models from `EXTERNAL_MODEL_RESULTS.json` such that they can be reloaded with the new results and are not cached. -- You may also have to uncomment `, download_mode='force_redownload', verification_mode="no_checks")` where the datasets are loaded to experiment locally without caching of results -- Test that it runs & works locally as you desire with python app.py, **please add screenshots to the PR** diff --git a/docs/adding_a_model.md b/docs/adding_a_model.md index 314c6e9c39..088199e264 100644 --- a/docs/adding_a_model.md +++ b/docs/adding_a_model.md @@ -2,7 +2,63 @@ The MTEB Leaderboard is available [here](https://huggingface.co/spaces/mteb/leaderboard). To submit to it: -1. **Run the desired model on MTEB:** +1. **Add meta information about your model to [model dir](../mteb/models/)**. + ```python + from mteb.model_meta import ModelMeta + + bge_m3 = ModelMeta( + name="model_name", + languages=["model_languages"], # in format eng-Latn + open_weights=True, + revision="5617a9f61b028005a4858fdac845db406aefb181", + release_date="2024-06-28", + n_parameters=568_000_000, + embed_dim=4096, + license="mit", + max_tokens=8194, + reference="https://huggingface.co/BAAI/bge-m3", + similarity_fn_name="cosine", + framework=["Sentence Transformers", "PyTorch"], + use_instructions=False, + public_training_code=None, + public_training_data="https://huggingface.co/datasets/cfli/bge-full-data", + training_datasets={"your_dataset": ["train"]}, + ) + ``` + By default, the model will run using the [`sentence_transformers_loader`](../mteb/models/sentence_transformer_wrapper.py) loader function. If you need to use a custom implementation, you can specify the `loader` parameter in the `ModelMeta` class. For example: + ```python + from mteb.models.wrapper import Wrapper + from mteb.encoder_interface import PromptType + import numpy as np + + class CustomWrapper(Wrapper): + def __init__(self, model_name, model_revision): + super().__init__(model_name, model_revision) + # your custom implementation here + + def encode( + self, + sentences: list[str], + *, + task_name: str, + prompt_type: PromptType | None = None, + **kwargs + ) -> np.ndarray: + # your custom implementation here + return np.zeros((len(sentences), self.embed_dim)) + ``` + Then you can specify the `loader` parameter in the `ModelMeta` class: + ```python + your_model = ModelMeta( + loader=partial( + CustomWrapper, + model_name="model_name", + model_revision="5617a9f61b028005a4858fdac845db406aefb181" + ), + ... + ) + ``` +2. **Run the desired model on MTEB:** Either use the Python API: @@ -32,45 +88,35 @@ These will save the results in a folder called `results/{model_name}/{model_revi To add results to the public leaderboard you can push your results to the [results repository](https://github.com/embeddings-benchmark/results) via a PR. Once merged they will appear on the leaderboard after a day. - -3. (Optional) **Add results to the model card:** - -`mteb` implements a cli for adding results to the model card: - -```bash -mteb create_meta --results_folder results/{model_name}/{model_revision} --output_path model_card.md -``` - -To add the content to the public model simply copy the content of the `model_card.md` file to the top of a `README.md` file of your model on the Hub. See [here](https://huggingface.co/Muennighoff/SGPT-5.8B-weightedmean-msmarco-specb-bitfit/blob/main/README.md) for an example. - -If the readme already exists: - -```bash -mteb create_meta --results_folder results/{model_name}/{model_revision} --output_path model_card.md --from_existing your_existing_readme.md -``` - -Note that running the model on many tasks may lead to a huge readme front matter. - -4. **Wait for a refresh the leaderboard:** - -The leaderboard [automatically refreshes daily](https://github.com/embeddings-benchmark/leaderboard/commits/main/) so once submitted you only need to wait for the automatic refresh. You can find the workflows for the leaderboard refresh [here](https://github.com/embeddings-benchmark/leaderboard/tree/main/.github/workflows). If you experience issues with the leaderboard please create an [issue](https://github.com/embeddings-benchmark/mteb/issues). +3. **Wait for a refresh the leaderboard** **Notes:** -- We remove models with scores that cannot be reproduced, so please ensure that your model is accessible and scores can be reproduced. -- ##### Using Prompts with Sentence Transformers +##### Using Prompts with Sentence Transformers - If your model uses Sentence Transformers and requires different prompts for encoding the queries and corpus, you can take advantage of the `prompts` [parameter](https://sbert.net/docs/package_reference/sentence_transformer/SentenceTransformer.html#sentence_transformers.SentenceTransformer). - - Internally, `mteb` uses the prompt named `query` for encoding the queries and `passage` as the prompt name for encoding the corpus. This is aligned with the default names used by Sentence Transformers. +If your model uses Sentence Transformers and requires different prompts for encoding the queries and corpus, you can take advantage of the `prompts` [parameter](https://sbert.net/docs/package_reference/sentence_transformer/SentenceTransformer.html#sentence_transformers.SentenceTransformer). - ###### Adding the prompts in the model configuration (Preferred) +Internally, `mteb` uses `query` for encoding the queries and `passage` as the prompt names for encoding the corpus. This is aligned with the default names used by Sentence Transformers. - You can directly add the prompts when saving and uploading your model to the Hub. For an example, refer to this [configuration file](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v1.5/blob/3b5a16eaf17e47bd997da998988dce5877a57092/config_sentence_transformers.json). +###### Adding the prompts in the model configuration (Preferred) - ###### Instantiating the Model with Prompts +You can directly add the prompts when saving and uploading your model to the Hub. For an example, refer to this [configuration file](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v1.5/blob/3b5a16eaf17e47bd997da998988dce5877a57092/config_sentence_transformers.json). These prompts can then be specified in the ModelMeta object. - If you are unable to directly add the prompts in the model configuration, you can instantiate the model using the `sentence_transformers_loader` and pass `prompts` as an argument. For more details, see the `mteb/models/bge_models.py` file. + +```python +model = ModelMeta( + loader=partial( # type: ignore + sentence_transformers_loader, + model_name="intfloat/multilingual-e5-small", + revision="fd1525a9fd15316a2d503bf26ab031a61d056e98", + model_prompts={ + "query": "query: ", + "passage": "passage: ", + }, + ), +) +``` +If you are unable to directly add the prompts in the model configuration, you can instantiate the model using the `sentence_transformers_loader` and pass `prompts` as an argument. For more details, see the `mteb/models/bge_models.py` file. ##### Adding instruction models @@ -85,4 +131,4 @@ model = ModelMeta( ), ... ) -``` \ No newline at end of file +```