From b281769f1f1bcca3f81fc6c4193ca96c81341926 Mon Sep 17 00:00:00 2001 From: Isaac Chung Date: Mon, 21 Apr 2025 19:14:20 +0300 Subject: [PATCH 1/7] add citation section --- docs/mieb/readme.md | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/docs/mieb/readme.md b/docs/mieb/readme.md index af23c8573e..8998dccff3 100644 --- a/docs/mieb/readme.md +++ b/docs/mieb/readme.md @@ -3,7 +3,7 @@ # Welcome to MIEB! 👋 -The Massive Image Embedding Benchmark (MIEB) is an image extension of [MTEB](https://arxiv.org/abs/2210.07316) to cover embedding tasks for image-text tasks. +The [Massive Image Embedding Benchmark (MIEB)](https://arxiv.org/abs/2504.10471) is an image extension of [MTEB](https://arxiv.org/abs/2210.07316) to cover embedding tasks for image-text tasks. ## 🌱 Background @@ -79,4 +79,20 @@ evaluation = mteb.MTEB(tasks=tasks) results = evaluation.run(model) ``` -By default, results will be under `results/laion__CLIP-ViT-L-14-laion2B-s32B-b82K/REVISION/CIFAR10ZeroShot.json`. Sometimes metrics can be a bit different than what the original paper claimed. This might be due to the resolution/layout difference of images in the remake of the dataset. +By default, results will be under `results/laion__CLIP-ViT-L-14-laion2B-s32B-b82K/REVISION/CIFAR10ZeroShot.json`. Sometimes metrics can be a bit different than what the original paper claimed. This might be due to the resolution/layout difference of images in the remake of the dataset. + +## Citing + +When using `mieb`, we recommend you use the following citation: + +```bibtex +@article{xiao2025mieb, + author = {Chenghao Xiao and Isaac Chung and Imene Kerboua and Jamie Stirling and Xin Zhang and Márton Kardos and Roman Solomatin and Noura Al Moubayed and Kenneth Enevoldsen and Niklas Muennighoff}, + title = {MIEB: Massive Image Embedding Benchmark}, + publisher = {arXiv}, + journal={arXiv preprint arXiv:2504.10471}, + year = {2025} + url = {https://arxiv.org/abs/2504.10471}, + doi = {10.48550/ARXIV.2504.10471}, +} +``` From a287d81e64b4260c6c6177bde9092c064d02d60a Mon Sep 17 00:00:00 2001 From: Isaac Chung Date: Mon, 21 Apr 2025 19:26:30 +0300 Subject: [PATCH 2/7] reference mieb readme in main readme.md --- README.md | 2 ++ docs/mieb/readme.md | 71 ++++++++++++++++++++++++++++++++++++++++----- 2 files changed, 66 insertions(+), 7 deletions(-) diff --git a/README.md b/README.md index 4ec8485602..ba8ea36ac8 100644 --- a/README.md +++ b/README.md @@ -78,6 +78,7 @@ The following links to the main sections in the usage documentation. | **General** | | | [Evaluating a Model](docs/usage/usage.md#evaluating-a-model) | How to evaluate a model | | [Evaluating on different Modalities](docs/usage/usage.md#evaluating-on-different-modalities) | How to evaluate image and image-text tasks | +| [MIEB](docs/mieb/readme.md) | How to run the Massive Image Embedding Benchmark | | **Selecting Tasks** | | | [Selecting a benchmark](docs/usage/usage.md#selecting-a-benchmark) | How to select benchmarks | | [Task selection](docs/usage/usage.md#task-selection) | How to select and filter tasks | @@ -99,6 +100,7 @@ The following links to the main sections in the usage documentation. | [Loading and working with Results](docs/usage/results.md) | How to load and working with the raw results from the leaderboard, including making result dataframes | + ## Overview | Overview | | diff --git a/docs/mieb/readme.md b/docs/mieb/readme.md index 8998dccff3..5781feb70c 100644 --- a/docs/mieb/readme.md +++ b/docs/mieb/readme.md @@ -9,6 +9,58 @@ The [Massive Image Embedding Benchmark (MIEB)](https://arxiv.org/abs/2504.10471) MIEB intends to extend MTEB and MMTEB to cover image representation learning and image-text alignment tasks. +## Running MIEB + +If you’re already familiar with how MTEB works, then run any task and model the same way! + + +### 🛠️Run MIEB in 2 lines via CLI +First, install the `mieb` dependencies: +```sh +pip install mteb[image] +``` + +Then, run the multilingual benchmark with a selected model, e.g. CLIP: +```sh +mteb run -b ‘MIEB(Multilingual)’ -m openai/clip-vit-base-patch16 +``` + +### 🧪 Run MIEB in Python + +Similarly, running the benchmark can be done in Python in 3 main steps: 1) Select the tasks, load the model, and run the evaluation. + +1. Select the whole benchmark +```python +import mteb + +tasks = mteb.get_benchmarks("MIEB(Multilingual)") +``` + +Alternatively, select a single task: +```python +tasks = mteb.get_tasks(tasks=["CIFAR10ZeroShot"]) +``` + +Or select tasks by categories: +```python +tasks = mteb.get_tasks(task_types=["Compositionality"]) +``` + +2. Load a Model: + +```python +model_name = "laion/CLIP-ViT-L-14-laion2B-s32B-b82K" +model = mteb.get_model(model_name=model_name) +``` + +3. Run the Evaluation: + +```python +evaluation = mteb.MTEB(tasks=tasks) +results = evaluation.run(model) +``` + + ## 🪴 Contributing to MIEB The FIRST step is to _always_ create an issue in the MTEB repo (this one), and add the `mieb` label. PRs without issues will not be accepted. @@ -19,15 +71,18 @@ There are a few ways for anyone to contribute to MIEB: 2. Add a model. This could mean either: a) The model wrapper, e.g. `OpenCLIPWrapper`, already exists, and the effort is solely in adding a filled out `ModelMeta` object, and/or b) Add a new model wrapper. 3. Add a new task type. This means that the existing task types do not cover this new task. An accompanying evaluator should also be implemented. -Let's go through an example. +Let's go through an example. -## Example +
+ Contribution Example (click to unfold) + +### Example Here is an example implementing a zero-shot image classification from scratch. Let's say we wish to implement CIFAR10 as a task and evaluate an OpenCLIP model on it. To solve this task, we need to encode the `images`, encode the `class label candidates with prompts` (e.g. "this is a dog pic", "this is a cat pic"), and compare them by calculating similarity, and then argmax out the class prediction for each image. We begin by implementing a model wrapper. -### Model Wrapper +#### Model Wrapper See the [`ImageEncoder` class](https://github.com/embeddings-benchmark/mteb/blob/mieb/mteb/encoder_interface.py) for more details. The model class implements `get_text_embeddings`, `get_image_embeddings`, and `calculate_probs` methods. As an example, [`OpenCLIPWrapper`](https://github.com/embeddings-benchmark/mteb/blob/mieb/mteb/models/openclip_models.py) is first implemented, with metadata defined below. ```python @@ -36,7 +91,7 @@ class OpenCLIPWrapper: ``` See also [adding a model](adding_a_model.md) for reference. -### X Evaluator +#### X Evaluator With the model, [ZeroShotClassificationEvaluator](https://github.com/embeddings-benchmark/mteb/blob/mieb/mteb/evaluation/evaluators/Image/ZeroShotClassificationEvaluator.py) is implemented here. This defines how the model are used to do zero-shot classification and get back results on desired metrics. ```python class ZeroShotClassificationEvaluator(Evaluator): @@ -47,7 +102,7 @@ class ZeroShotClassificationEvaluator(Evaluator): ... ``` -### AbsTask X +#### AbsTask X With the evaluator, [AbsTaskZeroShotClassification](https://github.com/embeddings-benchmark/mteb/blob/mieb/mteb/abstasks/Image/AbsTaskZeroShotClassification.py) is defined, operating on the dataset, calling the defined Evaluator, and gives out results. ```python class AbsTaskZeroShotClassification(AbsTask): @@ -55,7 +110,7 @@ class AbsTaskZeroShotClassification(AbsTask): ``` -### Dataset class +#### Dataset class With all these, we can then define the dataset. [CIFAR10](https://github.com/embeddings-benchmark/mteb/blob/mieb/mteb/tasks/Image/ZeroShotClassification/eng/CIFAR.py) is implemented like this, subclassing `AbsTaskZeroShotClassification`, and overwrite the `get_candidate_labels` function, which gives `["a photo of {label_name}"]` to be used in the evaluator. ```python class CIFAR10ZeroShotClassification(AbsTaskZeroShotClassification): @@ -66,7 +121,7 @@ class CIFAR10ZeroShotClassification(AbsTaskZeroShotClassification): ``` See also [adding a dataset](adding_a_dataset.md) for reference. -### Putting them all together +#### Putting them all together With all these, we can then ```python import mteb @@ -81,6 +136,8 @@ results = evaluation.run(model) By default, results will be under `results/laion__CLIP-ViT-L-14-laion2B-s32B-b82K/REVISION/CIFAR10ZeroShot.json`. Sometimes metrics can be a bit different than what the original paper claimed. This might be due to the resolution/layout difference of images in the remake of the dataset. +
+ ## Citing When using `mieb`, we recommend you use the following citation: From 7c060105f032a09d6c840504fdaf857b433c0c13 Mon Sep 17 00:00:00 2001 From: Isaac Chung Date: Mon, 21 Apr 2025 19:28:35 +0300 Subject: [PATCH 3/7] remove note --- docs/mieb/readme.md | 3 --- 1 file changed, 3 deletions(-) diff --git a/docs/mieb/readme.md b/docs/mieb/readme.md index 5781feb70c..db5598e7fc 100644 --- a/docs/mieb/readme.md +++ b/docs/mieb/readme.md @@ -1,6 +1,3 @@ -**NOTE**: This collaboration have been finalized and the paper is soon to be released. This document remains for documentation. - - # Welcome to MIEB! 👋 The [Massive Image Embedding Benchmark (MIEB)](https://arxiv.org/abs/2504.10471) is an image extension of [MTEB](https://arxiv.org/abs/2210.07316) to cover embedding tasks for image-text tasks. From 7383c9d8f96b1337a406e1b53169ac4968765ce7 Mon Sep 17 00:00:00 2001 From: Isaac Chung Date: Mon, 21 Apr 2025 19:33:00 +0300 Subject: [PATCH 4/7] update benchmark citations --- mteb/benchmarks/benchmarks.py | 48 +++++++++++++++++------------------ 1 file changed, 24 insertions(+), 24 deletions(-) diff --git a/mteb/benchmarks/benchmarks.py b/mteb/benchmarks/benchmarks.py index fd11c6d828..11348ab3d5 100644 --- a/mteb/benchmarks/benchmarks.py +++ b/mteb/benchmarks/benchmarks.py @@ -1558,14 +1558,14 @@ document undestanding, visual STS, and CV-centric tasks.""", reference="", contacts=["gowitheflow-1998", "isaac-chung"], - citation="""@misc{xiao2025miebmassiveimageembedding, - title={MIEB: Massive Image Embedding Benchmark}, - author={Chenghao Xiao and Isaac Chung and Imene Kerboua and Jamie Stirling and Xin Zhang and Márton Kardos and Roman Solomatin and Noura Al Moubayed and Kenneth Enevoldsen and Niklas Muennighoff}, - year={2025}, - eprint={2504.10471}, - archivePrefix={arXiv}, - primaryClass={cs.CV}, - url={https://arxiv.org/abs/2504.10471}, + citation="""@article{xiao2025mieb, + author = {Chenghao Xiao and Isaac Chung and Imene Kerboua and Jamie Stirling and Xin Zhang and Márton Kardos and Roman Solomatin and Noura Al Moubayed and Kenneth Enevoldsen and Niklas Muennighoff}, + title = {MIEB: Massive Image Embedding Benchmark}, + publisher = {arXiv}, + journal={arXiv preprint arXiv:2504.10471}, + year = {2025} + url = {https://arxiv.org/abs/2504.10471}, + doi = {10.48550/ARXIV.2504.10471}, }""", ) @@ -1589,14 +1589,14 @@ datasets + the multilingual parts of VisualSTS-b and VisualSTS-16.""", reference="", contacts=["gowitheflow-1998", "isaac-chung"], - citation="""@misc{xiao2025miebmassiveimageembedding, - title={MIEB: Massive Image Embedding Benchmark}, - author={Chenghao Xiao and Isaac Chung and Imene Kerboua and Jamie Stirling and Xin Zhang and Márton Kardos and Roman Solomatin and Noura Al Moubayed and Kenneth Enevoldsen and Niklas Muennighoff}, - year={2025}, - eprint={2504.10471}, - archivePrefix={arXiv}, - primaryClass={cs.CV}, - url={https://arxiv.org/abs/2504.10471}, + citation="""@article{xiao2025mieb, + author = {Chenghao Xiao and Isaac Chung and Imene Kerboua and Jamie Stirling and Xin Zhang and Márton Kardos and Roman Solomatin and Noura Al Moubayed and Kenneth Enevoldsen and Niklas Muennighoff}, + title = {MIEB: Massive Image Embedding Benchmark}, + publisher = {arXiv}, + journal={arXiv preprint arXiv:2504.10471}, + year = {2025} + url = {https://arxiv.org/abs/2504.10471}, + doi = {10.48550/ARXIV.2504.10471}, }""", ) @@ -1669,14 +1669,14 @@ relative rank of models.""", reference="", contacts=["gowitheflow-1998", "isaac-chung"], - citation="""@misc{xiao2025miebmassiveimageembedding, - title={MIEB: Massive Image Embedding Benchmark}, - author={Chenghao Xiao and Isaac Chung and Imene Kerboua and Jamie Stirling and Xin Zhang and Márton Kardos and Roman Solomatin and Noura Al Moubayed and Kenneth Enevoldsen and Niklas Muennighoff}, - year={2025}, - eprint={2504.10471}, - archivePrefix={arXiv}, - primaryClass={cs.CV}, - url={https://arxiv.org/abs/2504.10471}, + citation="""@article{xiao2025mieb, + author = {Chenghao Xiao and Isaac Chung and Imene Kerboua and Jamie Stirling and Xin Zhang and Márton Kardos and Roman Solomatin and Noura Al Moubayed and Kenneth Enevoldsen and Niklas Muennighoff}, + title = {MIEB: Massive Image Embedding Benchmark}, + publisher = {arXiv}, + journal={arXiv preprint arXiv:2504.10471}, + year = {2025} + url = {https://arxiv.org/abs/2504.10471}, + doi = {10.48550/ARXIV.2504.10471}, }""", ) From bda9b5b2442a2b6fc48df76049643330356bf487 Mon Sep 17 00:00:00 2001 From: Isaac Chung Date: Mon, 21 Apr 2025 19:33:24 +0300 Subject: [PATCH 5/7] add more to mieb readme --- docs/mieb/readme.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/docs/mieb/readme.md b/docs/mieb/readme.md index db5598e7fc..aafc4cc6a7 100644 --- a/docs/mieb/readme.md +++ b/docs/mieb/readme.md @@ -4,11 +4,14 @@ The [Massive Image Embedding Benchmark (MIEB)](https://arxiv.org/abs/2504.10471) ## 🌱 Background -MIEB intends to extend MTEB and MMTEB to cover image representation learning and image-text alignment tasks. +MIEB intends to extend MTEB and MMTEB to cover image representation learning and image-text alignment tasks. At the time of publishing, MIEB offers 130 tasks over 8 task categories. 3 benchmarks are offered: +1. `MIEB(Multilingual)` +2. `MIEB(eng)` +3. `MIEB(lite)` ## Running MIEB -If you’re already familiar with how MTEB works, then run any task and model the same way! +If you’re already familiar with how MTEB works, then run any benchmark, task, and model the same way! ### 🛠️Run MIEB in 2 lines via CLI From 79db0b3a9fe4315177a1349a842b94314df4a744 Mon Sep 17 00:00:00 2001 From: Isaac Chung Date: Mon, 21 Apr 2025 19:35:07 +0300 Subject: [PATCH 6/7] consistent emojis --- docs/mieb/readme.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/mieb/readme.md b/docs/mieb/readme.md index aafc4cc6a7..e9850b2afb 100644 --- a/docs/mieb/readme.md +++ b/docs/mieb/readme.md @@ -9,12 +9,12 @@ MIEB intends to extend MTEB and MMTEB to cover image representation learning and 2. `MIEB(eng)` 3. `MIEB(lite)` -## Running MIEB +## 🚀 Running MIEB If you’re already familiar with how MTEB works, then run any benchmark, task, and model the same way! -### 🛠️Run MIEB in 2 lines via CLI +### Run MIEB in 2 lines via CLI First, install the `mieb` dependencies: ```sh pip install mteb[image] @@ -25,7 +25,7 @@ Then, run the multilingual benchmark with a selected model, e.g. CLIP: mteb run -b ‘MIEB(Multilingual)’ -m openai/clip-vit-base-patch16 ``` -### 🧪 Run MIEB in Python +### Run MIEB in Python Similarly, running the benchmark can be done in Python in 3 main steps: 1) Select the tasks, load the model, and run the evaluation. From 0b037ecc19aa2dac174d0404238a6c889974c6ea Mon Sep 17 00:00:00 2001 From: Isaac Chung Date: Mon, 21 Apr 2025 19:36:39 +0300 Subject: [PATCH 7/7] fix citation syntax --- docs/mieb/readme.md | 2 +- mteb/benchmarks/benchmarks.py | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/mieb/readme.md b/docs/mieb/readme.md index e9850b2afb..52b7ab15ff 100644 --- a/docs/mieb/readme.md +++ b/docs/mieb/readme.md @@ -148,7 +148,7 @@ When using `mieb`, we recommend you use the following citation: title = {MIEB: Massive Image Embedding Benchmark}, publisher = {arXiv}, journal={arXiv preprint arXiv:2504.10471}, - year = {2025} + year = {2025}, url = {https://arxiv.org/abs/2504.10471}, doi = {10.48550/ARXIV.2504.10471}, } diff --git a/mteb/benchmarks/benchmarks.py b/mteb/benchmarks/benchmarks.py index 11348ab3d5..e1d7f29bdb 100644 --- a/mteb/benchmarks/benchmarks.py +++ b/mteb/benchmarks/benchmarks.py @@ -1563,7 +1563,7 @@ title = {MIEB: Massive Image Embedding Benchmark}, publisher = {arXiv}, journal={arXiv preprint arXiv:2504.10471}, - year = {2025} + year = {2025}, url = {https://arxiv.org/abs/2504.10471}, doi = {10.48550/ARXIV.2504.10471}, }""", @@ -1594,7 +1594,7 @@ title = {MIEB: Massive Image Embedding Benchmark}, publisher = {arXiv}, journal={arXiv preprint arXiv:2504.10471}, - year = {2025} + year = {2025}, url = {https://arxiv.org/abs/2504.10471}, doi = {10.48550/ARXIV.2504.10471}, }""", @@ -1674,7 +1674,7 @@ title = {MIEB: Massive Image Embedding Benchmark}, publisher = {arXiv}, journal={arXiv preprint arXiv:2504.10471}, - year = {2025} + year = {2025}, url = {https://arxiv.org/abs/2504.10471}, doi = {10.48550/ARXIV.2504.10471}, }""",