Skip to content

Conversation

@isaac-chung
Copy link
Contributor

@isaac-chung isaac-chung commented Feb 26, 2025

To create the VisualSTS(eng) and VisualSTS(multi) columns in the LB for the MIEB benchmarks (to align with the paper), we split the tasks by using aggregate tasks, so that we don't need to rerun results again, and only need to generate new ones.

Checklist

  • Run tests locally to make sure nothing is broken using make test.
  • Run the results files checker make pre-push.

Adding a model checklist

@KennethEnevoldsen
Copy link
Contributor

hmm, what a hacky way of using the aggregate task ;)

Like it though. We could also do this for the BRIGHT tasks (on the old leaderboard the split appears as different tasks)

Should we by default exclude aggregate tasks for mteb.get_tasks()?

@isaac-chung
Copy link
Contributor Author

hmm, what a hacky way of using the aggregate task ;)

Like it though. We could also do this for the BRIGHT tasks (on the old leaderboard the split appears as different tasks)

Should we by default exclude aggregate tasks for mteb.get_tasks()?

Thanks!

And I don't have a strong preference. I guess if they are excluded by default, a parameter would added to "enable" them? e.g. include_aggregate=True

@KennethEnevoldsen
Copy link
Contributor

And I don't have a strong preference. I guess if they are excluded by default, a parameter would added to "enable" them? e.g. include_aggregate=True

hmm yea. Let us just include them for now, but could imagine that we could include a lot of "duplicates".

@gowitheflow-1998
Copy link
Contributor

looks great. Scores are matching with the aggregated results in the paper too. Do we have all models here?

@isaac-chung
Copy link
Contributor Author

isaac-chung commented Feb 27, 2025

looks great. Scores are matching with the aggregated results in the paper too. Do we have all models here?

@gowitheflow-1998 only missing blip2 models. Could I get some help running them please? [update] All run.

@isaac-chung isaac-chung marked this pull request as ready for review February 27, 2025 13:47
@isaac-chung isaac-chung merged commit 5624f12 into main Feb 27, 2025
2 checks passed
@Samoed Samoed deleted the split-visualSTS-into-eng-and-multi branch December 24, 2025 08:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants