Split VisualSTS into eng and multi #133

isaac-chung · 2025-02-26T12:37:08Z

To create the VisualSTS(eng) and VisualSTS(multi) columns in the LB for the MIEB benchmarks (to align with the paper), we split the tasks by using aggregate tasks, so that we don't need to rerun results again, and only need to generate new ones.

Checklist

Run tests locally to make sure nothing is broken using make test.
Run the results files checker make pre-push.

Adding a model checklist

I have added model implementation to mteb/models/ directory. Instruction to add a model can be found here in the following PR: feat: Add MIEB and MIEB-lite as benchmarks mteb#2035

KennethEnevoldsen · 2025-02-26T13:26:49Z

hmm, what a hacky way of using the aggregate task ;)

Like it though. We could also do this for the BRIGHT tasks (on the old leaderboard the split appears as different tasks)

Should we by default exclude aggregate tasks for mteb.get_tasks()?

isaac-chung · 2025-02-26T13:30:29Z

hmm, what a hacky way of using the aggregate task ;)

Like it though. We could also do this for the BRIGHT tasks (on the old leaderboard the split appears as different tasks)

Should we by default exclude aggregate tasks for mteb.get_tasks()?

Thanks!

And I don't have a strong preference. I guess if they are excluded by default, a parameter would added to "enable" them? e.g. include_aggregate=True

KennethEnevoldsen · 2025-02-26T20:36:11Z

And I don't have a strong preference. I guess if they are excluded by default, a parameter would added to "enable" them? e.g. include_aggregate=True

hmm yea. Let us just include them for now, but could imagine that we could include a lot of "duplicates".

gowitheflow-1998 · 2025-02-27T08:33:46Z

looks great. Scores are matching with the aggregated results in the paper too. Do we have all models here?

isaac-chung · 2025-02-27T11:05:07Z

looks great. Scores are matching with the aggregated results in the paper too. Do we have all models here?

@gowitheflow-1998 ~~only missing blip2 models. Could I get some help running them please?~~ [update] All run.

isaac-chung added 10 commits February 26, 2025 07:36

add openai clip res

6c1daa3

add siglip results

7c431a2

add open clip res

1973c19

add blip res

7f37d39

add jina clip, align-base, dinov2, and bge-v res

d61126f

add e5-v res

4284359

add moco res

b0a3312

add nomic res

fa61943

add eva clip res

1636c81

add vlm2vec res

2647a7c

add voyage m3 res

6d68306

gowitheflow-1998 approved these changes Feb 27, 2025

View reviewed changes

Samoed mentioned this pull request Feb 27, 2025

Exclude aggregated tasks from get_task embeddings-benchmark/mteb#2176

Closed

isaac-chung added 2 commits February 27, 2025 13:42

add blip2 res

3138b6d

Merge branch 'main' into split-visualSTS-into-eng-and-multi

1e3e50f

isaac-chung marked this pull request as ready for review February 27, 2025 13:47

isaac-chung merged commit 5624f12 into main Feb 27, 2025
2 checks passed

Samoed deleted the split-visualSTS-into-eng-and-multi branch December 24, 2025 08:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split VisualSTS into eng and multi #133

Split VisualSTS into eng and multi #133

Uh oh!

isaac-chung commented Feb 26, 2025 •

edited

Loading

Uh oh!

KennethEnevoldsen commented Feb 26, 2025

Uh oh!

isaac-chung commented Feb 26, 2025

Uh oh!

KennethEnevoldsen commented Feb 26, 2025

Uh oh!

gowitheflow-1998 commented Feb 27, 2025

Uh oh!

isaac-chung commented Feb 27, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Split VisualSTS into eng and multi #133

Split VisualSTS into eng and multi #133

Uh oh!

Conversation

isaac-chung commented Feb 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Adding a model checklist

Uh oh!

KennethEnevoldsen commented Feb 26, 2025

Uh oh!

isaac-chung commented Feb 26, 2025

Uh oh!

KennethEnevoldsen commented Feb 26, 2025

Uh oh!

gowitheflow-1998 commented Feb 27, 2025

Uh oh!

isaac-chung commented Feb 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

isaac-chung commented Feb 26, 2025 •

edited

Loading

isaac-chung commented Feb 27, 2025 •

edited

Loading