Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
159 commits
Select commit Hold shift + click to select a range
5e7b27d
mieb ZeroshotClassification
gowitheflow-1998 Jul 5, 2024
b6c3d48
mieb docs
gowitheflow-1998 Jul 5, 2024
3e35fdb
mieb implementation demo
gowitheflow-1998 Jul 5, 2024
4d70961
model meta; abstask column names; linear probe clf
gowitheflow-1998 Jul 11, 2024
102a91e
Merge branch 'mieb' of https://github.com/embeddings-benchmark/mteb i…
gowitheflow-1998 Jul 11, 2024
f5e504a
model meta; abstask column names; linear probe clf
gowitheflow-1998 Jul 11, 2024
46f3d91
fix: update naming as candidate_labels
isaac-chung Jul 12, 2024
f8035ec
Update README.md
gowitheflow-1998 Jul 14, 2024
96870af
Update README.md
gowitheflow-1998 Jul 14, 2024
5c2df6b
i2tretrieval
gowitheflow-1998 Jul 15, 2024
976acc5
test load data ignore i2tretrieval
gowitheflow-1998 Jul 15, 2024
ddc4b6e
[MIEB] Add image clustering (#1088)
isaac-chung Jul 15, 2024
9e50f22
remove unused & fix typos
gowitheflow-1998 Jul 15, 2024
68fa26a
T2I Retrieval
gowitheflow-1998 Jul 16, 2024
1f62493
Any2AnyRetrieval
gowitheflow-1998 Jul 17, 2024
bc4dfdf
Merge branch 'main' into mieb
isaac-chung Jul 19, 2024
f2aba4f
fix tests from merge
isaac-chung Jul 19, 2024
b8561b8
[MIEB] Add image text pair classification and tests (#1099)
isaac-chung Jul 19, 2024
3f888fa
[MIEB] Add image classification and zero shot classification tasks (#…
isaac-chung Jul 20, 2024
15721ff
[MIEB] Add CIFAR clustering (#1104)
isaac-chung Jul 21, 2024
4d99e90
[MIEB] Add more image classification and zero shot classification dat…
isaac-chung Jul 21, 2024
6b49181
correct eurosat zero shot labels
isaac-chung Jul 21, 2024
b827a19
add abstask for image multilable and voc2007
isaac-chung Jul 21, 2024
630d4a5
make lint
isaac-chung Jul 21, 2024
935f621
[MIEB] Add more image classification and zero shot datasets (#1105)
isaac-chung Jul 22, 2024
77b0e35
correct SUN397 zero shot captions
isaac-chung Jul 22, 2024
a8841b2
add baai bge vista
gowitheflow-1998 Jul 24, 2024
38acf7c
add e5-v
gowitheflow-1998 Jul 25, 2024
da7c8ba
linting
gowitheflow-1998 Jul 25, 2024
fd64f75
memory issues for image linear probe & zeroshot
gowitheflow-1998 Jul 26, 2024
2ad3a07
kknn linear probe arguments
gowitheflow-1998 Jul 26, 2024
1201441
del comments
gowitheflow-1998 Jul 26, 2024
c0f0021
Add some classification and ZeroShot classification tasks (#1107)
imenelydiaker Jul 30, 2024
6970a4c
fix dependency & clip mock test
gowitheflow-1998 Jul 30, 2024
e94d6f3
[MIEB] Add jina clip (#1120)
isaac-chung Jul 31, 2024
c129ae2
[MIEB] Update `mieb` with the `main` branch and some fixes (#1126)
isaac-chung Jul 31, 2024
4d50084
image memoery issues for all retrieval Abstasks
gowitheflow-1998 Jul 31, 2024
33047fb
Add CLEVR and SciMMIR Image-Text Understanding tasks (#1127)
imenelydiaker Jul 31, 2024
da470bd
add fashion200k & fashionIQ test passed
gowitheflow-1998 Aug 4, 2024
c59fd95
clip text max seq truncation
gowitheflow-1998 Aug 7, 2024
8613945
add WebQA, NIGHTS, OVEN
gowitheflow-1998 Aug 7, 2024
f8aaf6d
any2any retrieval chunk encoding
gowitheflow-1998 Aug 7, 2024
6979912
add nomic vision model; any2any topk bug
gowitheflow-1998 Aug 10, 2024
621af0c
add cv recall
gowitheflow-1998 Aug 10, 2024
2631eaa
add InfoSeek; VisualNews
gowitheflow-1998 Aug 10, 2024
494b563
[MIEB] Add Stanford Cars i2i Retrieval (#1147)
isaac-chung Aug 12, 2024
b58009c
[MIEB] Add CUB200 i2i retrieval (#1154)
isaac-chung Aug 13, 2024
fc2fcb9
consolidate i2t and t2i to any2any
isaac-chung Aug 13, 2024
229c392
remove abstask and evaluators
isaac-chung Aug 13, 2024
f10299c
remove references from test
isaac-chung Aug 13, 2024
f78e61a
tu-add berlin sketch retrieval
gowitheflow-1998 Aug 16, 2024
df047e0
XM3600; XFlickr30kCO; mutilingual
gowitheflow-1998 Aug 17, 2024
17f1dd1
wit multilingual retrieval t2i
gowitheflow-1998 Aug 19, 2024
c5a1c07
correct multilingual t2i meta
gowitheflow-1998 Aug 19, 2024
99cc566
meta
gowitheflow-1998 Aug 19, 2024
848dea6
add dinov2 model; 4 sizes
gowitheflow-1998 Aug 26, 2024
8a274e2
cls evaluator channel bug fix
gowitheflow-1998 Aug 26, 2024
65da41c
add ALIGN model
gowitheflow-1998 Aug 26, 2024
3e50e69
add FORBI2IRetrieval
isaac-chung Aug 27, 2024
5038ae6
forb & tuberlin new revision
gowitheflow-1998 Sep 1, 2024
330f78a
disable tokenization parallelism
gowitheflow-1998 Sep 1, 2024
77effde
add hateful meme retrieval i2tt2i
gowitheflow-1998 Sep 1, 2024
928b6f9
add memotion retrieval t2ii2t
gowitheflow-1998 Sep 2, 2024
a7c6e7c
add SciMMIR Retrieval i2tt2i
gowitheflow-1998 Sep 4, 2024
6ef5377
ruff update
gowitheflow-1998 Sep 7, 2024
b5c8657
Merge branch 'main' into mieb; resolved conflicts; circular imports
gowitheflow-1998 Sep 8, 2024
e696f1a
Visual STS Abstask&evaluator
gowitheflow-1998 Sep 12, 2024
4efb410
add visual STS17
gowitheflow-1998 Sep 13, 2024
2aa93be
add visual STS 12-16
gowitheflow-1998 Sep 15, 2024
15a511b
[mieb] Add blip and blip2 models, and ImageNetDog15Clustering task (#…
Jamie-Stirling Sep 20, 2024
99e631f
[mieb] add 3 compositionality evaluation tasks (#1229)
gowitheflow-1998 Sep 22, 2024
d5bfece
add SOPI2IRetrieval dataset/task (#1232)
Jamie-Stirling Sep 23, 2024
a7883b5
Image text pair cls (#1233)
gowitheflow-1998 Sep 26, 2024
305afba
Add RP2kI2IRetrieval and METI2IRetrieval (#1239)
Jamie-Stirling Sep 29, 2024
f1fe91f
[MIEB] Adding DataComp CLIP models (#1283)
isaac-chung Oct 8, 2024
b0bc4e2
[mieb] Any2TextMultipleChoice Abstask&Evaluator & four tasks in CV-be…
gowitheflow-1998 Oct 11, 2024
1b70f6d
[mieb] adding 10 tasks (#1290)
gowitheflow-1998 Oct 15, 2024
6e7dd3d
[mieb] Adding MOCOv3 models (#1293)
isaac-chung Oct 18, 2024
053b5be
[mieb] Add more Any2AnyRetrieval datasets (#1285)
Jamie-Stirling Oct 20, 2024
a3ec14d
[mieb] Add any2any multiple choice evaluator and abstask (and one tas…
Jamie-Stirling Oct 20, 2024
b73a133
[mieb] Fix FORB dataset (#1306)
isaac-chung Oct 22, 2024
a6f306f
[mieb] run tasks fix (#1302)
gowitheflow-1998 Oct 22, 2024
22751ca
[mieb] split RParisI2IRetrieval and ROxfordI2IRetrieval into easy, me…
Jamie-Stirling Oct 22, 2024
8065568
[mieb] run tasks small fix (#1310)
gowitheflow-1998 Oct 22, 2024
2011aa1
[mieb] Add VLM2vec (#1323)
isaac-chung Oct 25, 2024
93260cb
feat: Merge main into MIEB (#1329)
KennethEnevoldsen Oct 27, 2024
6979b2a
[mieb] Add OpenCLIP models (#1335)
isaac-chung Oct 28, 2024
45ffa44
[mieb] new version with downsampled train split to 32 per class (#1327)
isaac-chung Oct 28, 2024
8054607
[mieb] Fix Jina CLIP (#1349)
isaac-chung Oct 28, 2024
874c1bc
fix: Add clevr license (#1356)
KennethEnevoldsen Oct 29, 2024
cf8ea1f
Add BLINK as multi-choice tasks (#1348)
Jamie-Stirling Oct 29, 2024
6652e56
[mieb] add Eva CLIP models (#1369)
isaac-chung Oct 31, 2024
9b178e6
[mieb] add siglip, cohere multimodal & some fixes for final run (#1357)
gowitheflow-1998 Oct 31, 2024
4b0facc
[mieb] fixes for final run (#1374)
gowitheflow-1998 Nov 1, 2024
a449b24
Update run_vista.md
gowitheflow-1998 Nov 1, 2024
3a18fbd
[mieb] Fix torch no grad (#1378)
Muennighoff Nov 4, 2024
1ef93e4
[mieb] Fix vlm2vec (#1380)
isaac-chung Nov 5, 2024
34094ea
[mieb] Remove null entries from corpus of ROxford, RParis (#1371)
Jamie-Stirling Nov 5, 2024
2b56317
[mieb] fixes (#1390)
Muennighoff Nov 5, 2024
2862323
[MIEB] Remove non-existent method for blip (#1394)
imenelydiaker Nov 5, 2024
8a8b8b7
[mieb] fix ALIGN; update Winoground revision id; update run script (#…
gowitheflow-1998 Nov 6, 2024
01b7f28
[mieb] Fix open clip for cv bench count (#1397)
isaac-chung Nov 7, 2024
cdb92c6
[mieb] Update subtasks of BLINKIT2TMultiChoice and BLINKIT2IMultiChoi…
Jamie-Stirling Nov 7, 2024
a06227e
[mieb] Fix EVA CLIP for CV Bench (#1414)
isaac-chung Nov 10, 2024
f757892
[mieb] Add calculate probs for vlm2vec (#1418)
isaac-chung Nov 10, 2024
f60465a
[mieb] Fix siglip bug & add retrieval datasets (#1424)
gowitheflow-1998 Nov 10, 2024
f0dd6f6
[mieb] use Logistic Regression classifier for AbsTaskImageMultilabelC…
isaac-chung Nov 10, 2024
66176a0
[mieb] mieb scripts (siglip rerun & linear probing ablation & params …
gowitheflow-1998 Nov 10, 2024
7e0779a
[MIEB] Change Flickr30k to test split (#1449)
Jamie-Stirling Nov 15, 2024
1429cce
[mieb] Fix VLM2vec dtype (#1462)
isaac-chung Nov 18, 2024
2fc19e7
[mieb] run script for missing results (#1472)
gowitheflow-1998 Nov 18, 2024
fab0b82
[mieb] Fix Moco model on CIFAR10Clustering (#1487)
isaac-chung Nov 22, 2024
67a035d
[mieb] Fix Flickr30k I2T and T2I (#1505)
isaac-chung Nov 27, 2024
ff34ff6
[MIEB] add missing siglip models (#1533)
SaitejaUtpala Nov 30, 2024
dc35ce3
fix typo (#1535)
SaitejaUtpala Nov 30, 2024
c77b923
[mieb] Fix numbers of CIRR, Fashion200k, FashionIQ, Flickr30k, MSCOCO…
izhx Dec 4, 2024
db5315f
Discussing a standard for ImageEncoders
KennethEnevoldsen Dec 4, 2024
d45fbb2
Add Voyage's multimodal embedding (#1555)
gowitheflow-1998 Dec 5, 2024
5f0b9c0
[mieb] update script for final re-run (#1576)
gowitheflow-1998 Dec 10, 2024
d2bb0ac
fix: no longer using same query text for all of BLINKIT2TMultiChoice …
Jamie-Stirling Dec 10, 2024
0ae63dc
[MIEB] Make multimodal models compatible to `task_name` and `prompt_t…
izhx Dec 14, 2024
1b3dfb5
Merge branch 'mieb' of https://github.com/embeddings-benchmark/mteb i…
KennethEnevoldsen Dec 16, 2024
074e5d4
fix image encoder (#1596)
KennethEnevoldsen Dec 16, 2024
74cb6e6
[mieb] voyage-v: add exponential backoff and other error handling (#1…
gowitheflow-1998 Dec 17, 2024
6740207
[MIEB] Fix `get_fused_emebddings` (#1612)
izhx Dec 22, 2024
24c3709
[MIEB] Add new multimodal retrieval tasks (#1611)
izhx Dec 24, 2024
ff74380
[MIEB] Switch to ViDoRe BEIR version (#1607)
izhx Dec 25, 2024
c14c006
Extend MIEB test coverage (#1629)
isaac-chung Dec 25, 2024
10ba68d
[mieb] Task filtering by modality supported by models (#1633)
isaac-chung Dec 26, 2024
f994dc6
[MIEB] Fix VISTA model (#1638)
izhx Dec 29, 2024
e83acbb
Warn (#1639)
Muennighoff Dec 29, 2024
ee26ebe
[mieb] model task modalities matching logic (#1640)
gowitheflow-1998 Dec 29, 2024
96e6106
[mieb] Use mock abstask classes (#1648)
isaac-chung Jan 1, 2025
075873f
[MIEB] Add code for GME models (#1635)
izhx Jan 1, 2025
07befcb
fix: add version check e5-v in mieb (#1723)
isaac-chung Jan 8, 2025
38fe14e
fix: change comparison to bigger than (#1743)
isaac-chung Jan 9, 2025
b021b9b
docs: Rework MIEB docs (#1802)
isaac-chung Jan 15, 2025
5194033
[mieb] Remove results-mieb folder (#1815)
isaac-chung Jan 16, 2025
1a6f743
[mieb] fixing lrap computation for multi-label classification (#1834)
gowitheflow-1998 Jan 19, 2025
668d3da
[mieb] Merge from main (#1853)
isaac-chung Jan 23, 2025
b9fe9f0
[mieb] Fill in align model meta (#1863)
isaac-chung Jan 24, 2025
6ca11d2
[mieb] Fill in clip and open clip model meta (#1876)
isaac-chung Jan 26, 2025
edaf0d6
[mieb] Fill in blip model meta (#1874)
isaac-chung Jan 26, 2025
d113086
[mieb] Fill in cohere_v and dinov2 model meta (#1880)
isaac-chung Jan 27, 2025
b347f2e
[mieb] Fill in e5v and eva clip model meta (#1885)
isaac-chung Jan 28, 2025
e8b2256
[mieb] Fill out gme v and jina clip model meta (#1887)
isaac-chung Jan 28, 2025
a783b04
[mieb] Fill in mocov3 and nomic vision model meta (#1890)
isaac-chung Jan 28, 2025
0c5f0fb
[mieb] Fill in siglip model meta (#1894)
isaac-chung Jan 28, 2025
d551bf9
[mieb] Fill in vista vlm2vec voyage v model meta (#1903)
isaac-chung Jan 29, 2025
788f8c4
[mieb] merge from main once more (#1942)
isaac-chung Feb 3, 2025
bd5fc30
Merge remote-tracking branch 'origin/main' into mieb
isaac-chung Feb 3, 2025
0ef9ce1
fix merge conflict error in task metadata
isaac-chung Feb 3, 2025
a9a53df
remove old file
isaac-chung Feb 3, 2025
0808155
add mieb to readme
isaac-chung Feb 3, 2025
b042e79
add comments to mieb task categories
isaac-chung Feb 3, 2025
c00f079
remove commented out code
isaac-chung Feb 3, 2025
6eb46a4
use logger.info in abstasks
isaac-chung Feb 3, 2025
78e9d6e
add blip2 dependency to pyproject
isaac-chung Feb 3, 2025
87ea21c
remove test code
isaac-chung Feb 3, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -483,6 +483,7 @@ evaluation.run(model, ...)
| 👩‍💻 [Adding a benchmark] | How to add a new benchmark to MTEB and to the leaderboard |
| 🤝 [Contributing] | How to contribute to MTEB and set it up for development |
| 🌐 [MMTEB] | An open-source effort to extend MTEB to cover a broad set of languages |
| 🖼️ [MIEB] | Extension of MTEB to image embeddings |

[Tasks]: docs/tasks.md
[Benchmarks]: docs/benchmarks.md
Expand All @@ -492,6 +493,7 @@ evaluation.run(model, ...)
[Adding a benchmark]: docs/adding_a_benchmark.md
[Leaderboard]: https://huggingface.co/spaces/mteb/leaderboard
[MMTEB]: docs/mmteb/readme.md
[MIEB]: docs/mieb.md
[Reproducible workflows]: docs/reproducible_workflow.md

## Citing
Expand Down
116 changes: 116 additions & 0 deletions docs/mieb.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
# Welcome to MIEB! 👋

The Massive Image Embedding Benchmark (MIEB) is an image extension of [MTEB](https://arxiv.org/abs/2210.07316) to cover embedding tasks for image-text tasks.

## 🌱 Background

MIEB intends to extend MTEB and MMTEB to cover image representation learning and image-text alignment tasks.

## 🪴 Contributing to MIEB

The FIRST step is to _always_ create an issue in the MTEB repo (this one), and add the `mieb` label. PRs without issues will not be accepted.

There are a few ways for anyone to contribute to MIEB:

1. Add a dataset as an existing task type. This means that the `AbsTask` already exists, e.g. `AbsTaskImageClassification`, and the effort is solely in adding an instance of it.
2. Add a model. This could mean either: a) The model wrapper, e.g. `OpenCLIPWrapper`, already exists, and the effort is solely in adding a filled out `ModelMeta` object, and/or b) Add a new model wrapper.
3. Add a new task type. This means that the existing task types do not cover this new task. An accompanying evaluator should also be implemented.

Let's go through an example.

## Example

Here is an example implementing a zero-shot image classification from scratch. Let's say we wish to implement CIFAR10 as a task and evaluate an OpenCLIP model on it.

To solve this task, we need to encode the `images`, encode the `class label candidates with prompts` (e.g. "this is a dog pic", "this is a cat pic"), and compare them by calculating similarity, and then argmax out the class prediction for each image. We begin by implementing a model wrapper.

### Model Wrapper
See the [`ImageEncoder` class](https://github.com/embeddings-benchmark/mteb/blob/mieb/mteb/encoder_interface.py) for more details. The model class implements `get_text_embeddings`, `get_image_embeddings`, and `calculate_probs` methods.
As an example, [`OpenCLIPWrapper`](https://github.com/embeddings-benchmark/mteb/blob/mieb/mteb/models/openclip_models.py) is first implemented, with metadata defined below.
```python
class OpenCLIPWrapper:
...
```
See also [adding a model](adding_a_model.md) for reference.

### X Evaluator
With the model, [ZeroshotClassificationEvaluator](https://github.com/embeddings-benchmark/mteb/blob/mieb/mteb/evaluation/evaluators/Image/ZeroshotClassificationEvaluator.py) is implemented here. This defines how the model are used to do zero-shot classification and get back results on desired metrics.
```python
class ZeroshotClassificationEvaluator(Evaluator):
def __init__(self, ...):
...
def __call__(self, model: Encoder, *, encode_kwargs: dict[str, Any] = {}):
"""Get embeddings and calculate scores."""
...
```

### AbsTask X
With the evaluator, [AbsTaskZeroshotClassification](https://github.com/embeddings-benchmark/mteb/blob/mieb/mteb/abstasks/Image/AbsTaskZeroshotClassification.py) is defined, operating on the dataset, calling the defined Evaluator, and gives out results.
```python
class AbsTaskZeroshotClassification(AbsTask):
...
```


### Dataset class
With all these, we can then define the dataset. [CIFAR10](https://github.com/embeddings-benchmark/mteb/blob/mieb/mteb/tasks/Image/ZeroshotClassification/eng/CIFAR.py) is implemented like this, subclassing `AbsTaskZeroshotClassification`, and overwrite the `get_candidate_labels` function, which gives `["a photo of {label_name}"]` to be used in the evaluator.
```python
class CIFAR10ZeroShotClassification(AbsTaskZeroshotClassification):
metadata = TaskMetadata(...)

def get_candidate_labels(self) -> list[str]:
...
```
See also [adding a dataset](adding_a_dataset.md) for reference.

### Putting them all together
With all these, we can then
```python
import mteb

model_name = "laion/CLIP-ViT-L-14-laion2B-s32B-b82K"
model = mteb.get_model(model_name=model_name)

tasks = mteb.get_tasks(tasks=["CIFAR10ZeroShot"])
evaluation = mteb.MTEB(tasks=tasks)
results = evaluation.run(model)
```

By default, results will be under `results/laion__CLIP-ViT-L-14-laion2B-s32B-b82K/REVISION/CIFAR10ZeroShot.json`. Sometimes metrics can be a bit different than what the original paper claimed. This might be due to the resolution/layout difference of images in the remake of the dataset.


## Specific Model running Instructions

Some models require some specific steps before running. Those are collected here.

<details>
<summary> Vista </summary>

## set up VISTA

```
git clone https://github.com/FlagOpen/FlagEmbedding.git
cd FlagEmbedding/research/visual_bge
pip install -e .
pip install torchvision timm einops ftfy
```
back to the root folder of mteb; download the vision tower for bge-base
```
cd ..
wget https://huggingface.co/BAAI/bge-visualized/resolve/main/Visualized_base_en_v1.5.pth?download=true
```
rename it to `visualized_base_en_V1.5.pth`
```
mv Visualized_base_en_v1.5.pth?download=true visualized_base_en_V1.5.pth
```
download the vision tower for bge-m3
```
wget https://huggingface.co/BAAI/bge-visualized/resolve/main/Visualized_m3.pth?download=true
```
rename it to `visualized_m3.pth`
```
mv Visualized_m3.pth?download=true visualized_m3.pth
```


</details>
Loading