embeddings-benchmark · isaac-chung · Apr 23, 2025 · Apr 21, 2025 · Apr 21, 2025 · Apr 21, 2025
diff --git a/README.md b/README.md
@@ -78,6 +78,7 @@ The following links to the main sections in the usage documentation.
 | **General** | |
 | [Evaluating a Model](docs/usage/usage.md#evaluating-a-model) | How to evaluate a model |
 | [Evaluating on different Modalities](docs/usage/usage.md#evaluating-on-different-modalities) | How to evaluate image and image-text tasks |
+| [MIEB](docs/mieb/readme.md) | How to run the Massive Image Embedding Benchmark |
 | **Selecting Tasks** | |
 | [Selecting a benchmark](docs/usage/usage.md#selecting-a-benchmark) | How to select benchmarks |
 | [Task selection](docs/usage/usage.md#task-selection) | How to select and filter tasks |
@@ -99,6 +100,7 @@ The following links to the main sections in the usage documentation.
 | [Loading and working with Results](docs/usage/results.md) | How to load and working with the raw results from the leaderboard, including making result dataframes |
 
 
+
 ## Overview
 
 | Overview                       |                                                                                     |

diff --git a/docs/mieb/readme.md b/docs/mieb/readme.md
@@ -1,13 +1,65 @@
-**NOTE**: This collaboration have been finalized and the paper is soon to be released. This document remains for documentation.
-
-
 # Welcome to MIEB! 👋
 
-The Massive Image Embedding Benchmark (MIEB) is an image extension of [MTEB](https://arxiv.org/abs/2210.07316) to cover embedding tasks for image-text tasks.
+The [Massive Image Embedding Benchmark (MIEB)](https://arxiv.org/abs/2504.10471) is an image extension of [MTEB](https://arxiv.org/abs/2210.07316) to cover embedding tasks for image-text tasks.
 
 ## 🌱 Background
 
-MIEB intends to extend MTEB and MMTEB to cover image representation learning and image-text alignment tasks.
+MIEB intends to extend MTEB and MMTEB to cover image representation learning and image-text alignment tasks. At the time of publishing, MIEB offers 130 tasks over 8 task categories. 3 benchmarks are offered:
+1. `MIEB(Multilingual)`
+2. `MIEB(eng)`
+3. `MIEB(lite)`
+
+## 🚀 Running MIEB
+
+If you’re already familiar with how MTEB works, then run any benchmark, task, and model the same way! 
+
+
+### Run MIEB in 2 lines via CLI
+First, install the `mieb` dependencies:
+```sh
+pip install mteb[image]
+```
+
+Then, run the multilingual benchmark with a selected model, e.g. CLIP:
+```sh
+mteb run -b ‘MIEB(Multilingual)’ -m openai/clip-vit-base-patch16
+```
+
+### Run MIEB in Python
+
+Similarly, running the benchmark can be done in Python in 3 main steps: 1) Select the tasks, load the model, and run the evaluation.
+
+1. Select the whole benchmark
+```python
+import mteb
+
+tasks = mteb.get_benchmarks("MIEB(Multilingual)")
+```
+
+Alternatively, select a single task:
+```python
+tasks = mteb.get_tasks(tasks=["CIFAR10ZeroShot"])
+```
+
+Or select tasks by categories:
+```python
+tasks = mteb.get_tasks(task_types=["Compositionality"])
+```
+
+2. Load a Model: 
+
+```python
+model_name = "laion/CLIP-ViT-L-14-laion2B-s32B-b82K"
+model = mteb.get_model(model_name=model_name)
+```
+
+3. Run the Evaluation: 
+
+```python
+evaluation = mteb.MTEB(tasks=tasks)
+results = evaluation.run(model)
+```
+
 
 ## 🪴 Contributing to MIEB
 
@@ -19,15 +71,18 @@ There are a few ways for anyone to contribute to MIEB:
   2.  Add a model. This could mean either: a) The model wrapper, e.g. `OpenCLIPWrapper`, already exists, and the effort is solely in adding a filled out `ModelMeta` object, and/or b) Add a new model wrapper.
   3. Add a new task type. This means that the existing task types do not cover this new task. An accompanying evaluator should also be implemented.
 
-Let's go through an example.
+Let's go through an example. 
+
+<details>
+  <summary> Contribution Example (click to unfold) </summary>
 
-## Example
+### Example
 
 Here is an example implementing a zero-shot image classification from scratch. Let's say we wish to implement CIFAR10 as a task and evaluate an OpenCLIP model on it.
 
 To solve this task, we need to encode the `images`, encode the `class label candidates with prompts` (e.g. "this is a dog pic", "this is a cat pic"), and compare them by calculating similarity, and then argmax out the class prediction for each image. We begin by implementing a model wrapper.
 
-### Model Wrapper
+#### Model Wrapper
 See the [`ImageEncoder` class](https://github.com/embeddings-benchmark/mteb/blob/mieb/mteb/encoder_interface.py) for more details. The model class implements `get_text_embeddings`, `get_image_embeddings`, and `calculate_probs` methods.
 As an example,  [`OpenCLIPWrapper`](https://github.com/embeddings-benchmark/mteb/blob/mieb/mteb/models/openclip_models.py) is first implemented, with metadata defined below.
 ```python
@@ -36,7 +91,7 @@ class OpenCLIPWrapper:
 ```
 See also [adding a model](adding_a_model.md) for reference.
 
-### X Evaluator
+#### X Evaluator
 With the model, [ZeroShotClassificationEvaluator](https://github.com/embeddings-benchmark/mteb/blob/mieb/mteb/evaluation/evaluators/Image/ZeroShotClassificationEvaluator.py) is implemented here. This defines how the model are used to do zero-shot classification and get back results on desired metrics.
 ```python
 class ZeroShotClassificationEvaluator(Evaluator):
@@ -47,15 +102,15 @@ class ZeroShotClassificationEvaluator(Evaluator):
         ...
 ```
 
-### AbsTask X
+#### AbsTask X
 With the evaluator, [AbsTaskZeroShotClassification](https://github.com/embeddings-benchmark/mteb/blob/mieb/mteb/abstasks/Image/AbsTaskZeroShotClassification.py) is defined, operating on the dataset, calling the defined Evaluator, and gives out results.
 ```python
 class AbsTaskZeroShotClassification(AbsTask):
     ...
 ```
 
 
-### Dataset class
+#### Dataset class
 With all these, we can then define the dataset. [CIFAR10](https://github.com/embeddings-benchmark/mteb/blob/mieb/mteb/tasks/Image/ZeroShotClassification/eng/CIFAR.py) is implemented like this, subclassing `AbsTaskZeroShotClassification`, and overwrite the `get_candidate_labels` function, which gives `["a photo of {label_name}"]` to be used in the evaluator.
 ```python
 class CIFAR10ZeroShotClassification(AbsTaskZeroShotClassification):
@@ -66,7 +121,7 @@ class CIFAR10ZeroShotClassification(AbsTaskZeroShotClassification):
 ```
 See also [adding a dataset](adding_a_dataset.md) for reference.
 
-### Putting them all together
+#### Putting them all together
 With all these, we can then
 ```python
 import mteb
@@ -79,4 +134,22 @@ evaluation = mteb.MTEB(tasks=tasks)
 results = evaluation.run(model)
 ```
 
-By default, results will be under `results/laion__CLIP-ViT-L-14-laion2B-s32B-b82K/REVISION/CIFAR10ZeroShot.json`. Sometimes metrics can be a bit different than what the original paper claimed. This might be due to the resolution/layout difference of images in the remake of the dataset.
+By default, results will be under `results/laion__CLIP-ViT-L-14-laion2B-s32B-b82K/REVISION/CIFAR10ZeroShot.json`. Sometimes metrics can be a bit different than what the original paper claimed. This might be due to the resolution/layout difference of images in the remake of the dataset.
+
+</details>
+
+## Citing
+
+When using `mieb`, we recommend you use the following citation:
+
+```bibtex
+@article{xiao2025mieb,
+  author = {Chenghao Xiao and Isaac Chung and Imene Kerboua and Jamie Stirling and Xin Zhang and Márton Kardos and Roman Solomatin and Noura Al Moubayed and Kenneth Enevoldsen and Niklas Muennighoff},
+  title = {MIEB: Massive Image Embedding Benchmark},
+  publisher = {arXiv},
+  journal={arXiv preprint arXiv:2504.10471},
+  year = {2025},
+  url = {https://arxiv.org/abs/2504.10471},
+  doi = {10.48550/ARXIV.2504.10471},
+}
+```
diff --git a/mteb/benchmarks/benchmarks.py b/mteb/benchmarks/benchmarks.py
@@ -1558,14 +1558,14 @@
     document undestanding, visual STS, and CV-centric tasks.""",
     reference="",
     contacts=["gowitheflow-1998", "isaac-chung"],
-    citation="""@misc{xiao2025miebmassiveimageembedding,
-      title={MIEB: Massive Image Embedding Benchmark}, 
-      author={Chenghao Xiao and Isaac Chung and Imene Kerboua and Jamie Stirling and Xin Zhang and Márton Kardos and Roman Solomatin and Noura Al Moubayed and Kenneth Enevoldsen and Niklas Muennighoff},
-      year={2025},
-      eprint={2504.10471},
-      archivePrefix={arXiv},
-      primaryClass={cs.CV},
-      url={https://arxiv.org/abs/2504.10471}, 
+    citation="""@article{xiao2025mieb,
+    author = {Chenghao Xiao and Isaac Chung and Imene Kerboua and Jamie Stirling and Xin Zhang and Márton Kardos and Roman Solomatin and Noura Al Moubayed and Kenneth Enevoldsen and Niklas Muennighoff},
+    title = {MIEB: Massive Image Embedding Benchmark},
+    publisher = {arXiv},
+    journal={arXiv preprint arXiv:2504.10471},
+    year = {2025},
+    url = {https://arxiv.org/abs/2504.10471},
+    doi = {10.48550/ARXIV.2504.10471},
     }""",
 )
 
@@ -1589,14 +1589,14 @@
     datasets + the multilingual parts of VisualSTS-b and VisualSTS-16.""",
     reference="",
     contacts=["gowitheflow-1998", "isaac-chung"],
-    citation="""@misc{xiao2025miebmassiveimageembedding,
-      title={MIEB: Massive Image Embedding Benchmark}, 
-      author={Chenghao Xiao and Isaac Chung and Imene Kerboua and Jamie Stirling and Xin Zhang and Márton Kardos and Roman Solomatin and Noura Al Moubayed and Kenneth Enevoldsen and Niklas Muennighoff},
-      year={2025},
-      eprint={2504.10471},
-      archivePrefix={arXiv},
-      primaryClass={cs.CV},
-      url={https://arxiv.org/abs/2504.10471}, 
+    citation="""@article{xiao2025mieb,
+    author = {Chenghao Xiao and Isaac Chung and Imene Kerboua and Jamie Stirling and Xin Zhang and Márton Kardos and Roman Solomatin and Noura Al Moubayed and Kenneth Enevoldsen and Niklas Muennighoff},
+    title = {MIEB: Massive Image Embedding Benchmark},
+    publisher = {arXiv},
+    journal={arXiv preprint arXiv:2504.10471},
+    year = {2025},
+    url = {https://arxiv.org/abs/2504.10471},
+    doi = {10.48550/ARXIV.2504.10471},
     }""",
 )
 
@@ -1669,14 +1669,14 @@
     relative rank of models.""",
     reference="",
     contacts=["gowitheflow-1998", "isaac-chung"],
-    citation="""@misc{xiao2025miebmassiveimageembedding,
-      title={MIEB: Massive Image Embedding Benchmark}, 
-      author={Chenghao Xiao and Isaac Chung and Imene Kerboua and Jamie Stirling and Xin Zhang and Márton Kardos and Roman Solomatin and Noura Al Moubayed and Kenneth Enevoldsen and Niklas Muennighoff},
-      year={2025},
-      eprint={2504.10471},
-      archivePrefix={arXiv},
-      primaryClass={cs.CV},
-      url={https://arxiv.org/abs/2504.10471}, 
+    citation="""@article{xiao2025mieb,
+    author = {Chenghao Xiao and Isaac Chung and Imene Kerboua and Jamie Stirling and Xin Zhang and Márton Kardos and Roman Solomatin and Noura Al Moubayed and Kenneth Enevoldsen and Niklas Muennighoff},
+    title = {MIEB: Massive Image Embedding Benchmark},
+    publisher = {arXiv},
+    journal={arXiv preprint arXiv:2504.10471},
+    year = {2025},
+    url = {https://arxiv.org/abs/2504.10471},
+    doi = {10.48550/ARXIV.2504.10471},
     }""",
 )