Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@ The following links to the main sections in the usage documentation.
| **General** | |
| [Evaluating a Model](docs/usage/usage.md#evaluating-a-model) | How to evaluate a model |
| [Evaluating on different Modalities](docs/usage/usage.md#evaluating-on-different-modalities) | How to evaluate image and image-text tasks |
| [MIEB](docs/mieb/readme.md) | How to run the Massive Image Embedding Benchmark |
| **Selecting Tasks** | |
| [Selecting a benchmark](docs/usage/usage.md#selecting-a-benchmark) | How to select benchmarks |
| [Task selection](docs/usage/usage.md#task-selection) | How to select and filter tasks |
Expand All @@ -99,6 +100,7 @@ The following links to the main sections in the usage documentation.
| [Loading and working with Results](docs/usage/results.md) | How to load and working with the raw results from the leaderboard, including making result dataframes |



## Overview

| Overview | |
Expand Down
99 changes: 86 additions & 13 deletions docs/mieb/readme.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,65 @@
**NOTE**: This collaboration have been finalized and the paper is soon to be released. This document remains for documentation.


# Welcome to MIEB! 👋

The Massive Image Embedding Benchmark (MIEB) is an image extension of [MTEB](https://arxiv.org/abs/2210.07316) to cover embedding tasks for image-text tasks.
The [Massive Image Embedding Benchmark (MIEB)](https://arxiv.org/abs/2504.10471) is an image extension of [MTEB](https://arxiv.org/abs/2210.07316) to cover embedding tasks for image-text tasks.

## 🌱 Background

MIEB intends to extend MTEB and MMTEB to cover image representation learning and image-text alignment tasks.
MIEB intends to extend MTEB and MMTEB to cover image representation learning and image-text alignment tasks. At the time of publishing, MIEB offers 130 tasks over 8 task categories. 3 benchmarks are offered:
1. `MIEB(Multilingual)`
2. `MIEB(eng)`
3. `MIEB(lite)`

## 🚀 Running MIEB

If you’re already familiar with how MTEB works, then run any benchmark, task, and model the same way!


### Run MIEB in 2 lines via CLI
First, install the `mieb` dependencies:
```sh
pip install mteb[image]
```

Then, run the multilingual benchmark with a selected model, e.g. CLIP:
```sh
mteb run -b ‘MIEB(Multilingual)’ -m openai/clip-vit-base-patch16
```

### Run MIEB in Python

Similarly, running the benchmark can be done in Python in 3 main steps: 1) Select the tasks, load the model, and run the evaluation.

1. Select the whole benchmark
```python
import mteb

tasks = mteb.get_benchmarks("MIEB(Multilingual)")
```

Alternatively, select a single task:
```python
tasks = mteb.get_tasks(tasks=["CIFAR10ZeroShot"])
```

Or select tasks by categories:
```python
tasks = mteb.get_tasks(task_types=["Compositionality"])
```

2. Load a Model:

```python
model_name = "laion/CLIP-ViT-L-14-laion2B-s32B-b82K"
model = mteb.get_model(model_name=model_name)
```

3. Run the Evaluation:

```python
evaluation = mteb.MTEB(tasks=tasks)
results = evaluation.run(model)
```


## 🪴 Contributing to MIEB

Expand All @@ -19,15 +71,18 @@ There are a few ways for anyone to contribute to MIEB:
2. Add a model. This could mean either: a) The model wrapper, e.g. `OpenCLIPWrapper`, already exists, and the effort is solely in adding a filled out `ModelMeta` object, and/or b) Add a new model wrapper.
3. Add a new task type. This means that the existing task types do not cover this new task. An accompanying evaluator should also be implemented.

Let's go through an example.
Let's go through an example.

<details>
<summary> Contribution Example (click to unfold) </summary>

## Example
### Example

Here is an example implementing a zero-shot image classification from scratch. Let's say we wish to implement CIFAR10 as a task and evaluate an OpenCLIP model on it.

To solve this task, we need to encode the `images`, encode the `class label candidates with prompts` (e.g. "this is a dog pic", "this is a cat pic"), and compare them by calculating similarity, and then argmax out the class prediction for each image. We begin by implementing a model wrapper.

### Model Wrapper
#### Model Wrapper
See the [`ImageEncoder` class](https://github.com/embeddings-benchmark/mteb/blob/mieb/mteb/encoder_interface.py) for more details. The model class implements `get_text_embeddings`, `get_image_embeddings`, and `calculate_probs` methods.
As an example, [`OpenCLIPWrapper`](https://github.com/embeddings-benchmark/mteb/blob/mieb/mteb/models/openclip_models.py) is first implemented, with metadata defined below.
```python
Expand All @@ -36,7 +91,7 @@ class OpenCLIPWrapper:
```
See also [adding a model](adding_a_model.md) for reference.

### X Evaluator
#### X Evaluator
With the model, [ZeroShotClassificationEvaluator](https://github.com/embeddings-benchmark/mteb/blob/mieb/mteb/evaluation/evaluators/Image/ZeroShotClassificationEvaluator.py) is implemented here. This defines how the model are used to do zero-shot classification and get back results on desired metrics.
```python
class ZeroShotClassificationEvaluator(Evaluator):
Expand All @@ -47,15 +102,15 @@ class ZeroShotClassificationEvaluator(Evaluator):
...
```

### AbsTask X
#### AbsTask X
With the evaluator, [AbsTaskZeroShotClassification](https://github.com/embeddings-benchmark/mteb/blob/mieb/mteb/abstasks/Image/AbsTaskZeroShotClassification.py) is defined, operating on the dataset, calling the defined Evaluator, and gives out results.
```python
class AbsTaskZeroShotClassification(AbsTask):
...
```


### Dataset class
#### Dataset class
With all these, we can then define the dataset. [CIFAR10](https://github.com/embeddings-benchmark/mteb/blob/mieb/mteb/tasks/Image/ZeroShotClassification/eng/CIFAR.py) is implemented like this, subclassing `AbsTaskZeroShotClassification`, and overwrite the `get_candidate_labels` function, which gives `["a photo of {label_name}"]` to be used in the evaluator.
```python
class CIFAR10ZeroShotClassification(AbsTaskZeroShotClassification):
Expand All @@ -66,7 +121,7 @@ class CIFAR10ZeroShotClassification(AbsTaskZeroShotClassification):
```
See also [adding a dataset](adding_a_dataset.md) for reference.

### Putting them all together
#### Putting them all together
With all these, we can then
```python
import mteb
Expand All @@ -79,4 +134,22 @@ evaluation = mteb.MTEB(tasks=tasks)
results = evaluation.run(model)
```

By default, results will be under `results/laion__CLIP-ViT-L-14-laion2B-s32B-b82K/REVISION/CIFAR10ZeroShot.json`. Sometimes metrics can be a bit different than what the original paper claimed. This might be due to the resolution/layout difference of images in the remake of the dataset.
By default, results will be under `results/laion__CLIP-ViT-L-14-laion2B-s32B-b82K/REVISION/CIFAR10ZeroShot.json`. Sometimes metrics can be a bit different than what the original paper claimed. This might be due to the resolution/layout difference of images in the remake of the dataset.

</details>

## Citing

When using `mieb`, we recommend you use the following citation:

```bibtex
@article{xiao2025mieb,
author = {Chenghao Xiao and Isaac Chung and Imene Kerboua and Jamie Stirling and Xin Zhang and Márton Kardos and Roman Solomatin and Noura Al Moubayed and Kenneth Enevoldsen and Niklas Muennighoff},
title = {MIEB: Massive Image Embedding Benchmark},
publisher = {arXiv},
journal={arXiv preprint arXiv:2504.10471},
year = {2025},
url = {https://arxiv.org/abs/2504.10471},
doi = {10.48550/ARXIV.2504.10471},
}
```
48 changes: 24 additions & 24 deletions mteb/benchmarks/benchmarks.py
Original file line number Diff line number Diff line change
Expand Up @@ -1558,14 +1558,14 @@
document undestanding, visual STS, and CV-centric tasks.""",
reference="",
contacts=["gowitheflow-1998", "isaac-chung"],
citation="""@misc{xiao2025miebmassiveimageembedding,
title={MIEB: Massive Image Embedding Benchmark},
author={Chenghao Xiao and Isaac Chung and Imene Kerboua and Jamie Stirling and Xin Zhang and Márton Kardos and Roman Solomatin and Noura Al Moubayed and Kenneth Enevoldsen and Niklas Muennighoff},
year={2025},
eprint={2504.10471},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2504.10471},
citation="""@article{xiao2025mieb,
author = {Chenghao Xiao and Isaac Chung and Imene Kerboua and Jamie Stirling and Xin Zhang and Márton Kardos and Roman Solomatin and Noura Al Moubayed and Kenneth Enevoldsen and Niklas Muennighoff},
title = {MIEB: Massive Image Embedding Benchmark},
publisher = {arXiv},
journal={arXiv preprint arXiv:2504.10471},
year = {2025},
url = {https://arxiv.org/abs/2504.10471},
doi = {10.48550/ARXIV.2504.10471},
}""",
)

Expand All @@ -1589,14 +1589,14 @@
datasets + the multilingual parts of VisualSTS-b and VisualSTS-16.""",
reference="",
contacts=["gowitheflow-1998", "isaac-chung"],
citation="""@misc{xiao2025miebmassiveimageembedding,
title={MIEB: Massive Image Embedding Benchmark},
author={Chenghao Xiao and Isaac Chung and Imene Kerboua and Jamie Stirling and Xin Zhang and Márton Kardos and Roman Solomatin and Noura Al Moubayed and Kenneth Enevoldsen and Niklas Muennighoff},
year={2025},
eprint={2504.10471},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2504.10471},
citation="""@article{xiao2025mieb,
author = {Chenghao Xiao and Isaac Chung and Imene Kerboua and Jamie Stirling and Xin Zhang and Márton Kardos and Roman Solomatin and Noura Al Moubayed and Kenneth Enevoldsen and Niklas Muennighoff},
title = {MIEB: Massive Image Embedding Benchmark},
publisher = {arXiv},
journal={arXiv preprint arXiv:2504.10471},
year = {2025},
url = {https://arxiv.org/abs/2504.10471},
doi = {10.48550/ARXIV.2504.10471},
}""",
)

Expand Down Expand Up @@ -1669,14 +1669,14 @@
relative rank of models.""",
reference="",
contacts=["gowitheflow-1998", "isaac-chung"],
citation="""@misc{xiao2025miebmassiveimageembedding,
title={MIEB: Massive Image Embedding Benchmark},
author={Chenghao Xiao and Isaac Chung and Imene Kerboua and Jamie Stirling and Xin Zhang and Márton Kardos and Roman Solomatin and Noura Al Moubayed and Kenneth Enevoldsen and Niklas Muennighoff},
year={2025},
eprint={2504.10471},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2504.10471},
citation="""@article{xiao2025mieb,
author = {Chenghao Xiao and Isaac Chung and Imene Kerboua and Jamie Stirling and Xin Zhang and Márton Kardos and Roman Solomatin and Noura Al Moubayed and Kenneth Enevoldsen and Niklas Muennighoff},
title = {MIEB: Massive Image Embedding Benchmark},
publisher = {arXiv},
journal={arXiv preprint arXiv:2504.10471},
year = {2025},
url = {https://arxiv.org/abs/2504.10471},
doi = {10.48550/ARXIV.2504.10471},
}""",
)

Expand Down