Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LB customization is difficult #2104

Open
Muennighoff opened this issue Feb 19, 2025 · 7 comments
Open

LB customization is difficult #2104

Muennighoff opened this issue Feb 19, 2025 · 7 comments
Labels
good first issue Good for newcomers leaderboard issues related to the leaderboard

Comments

@Muennighoff
Copy link
Contributor

Muennighoff commented Feb 19, 2025

Some people just care about selecting specific tasks but if I clear out all domains and want to select specific tasks then they will not appear cuz domains are empty I guess? also when adding a domain it will clear out my current task selection 🤔

Maybe it is worth having a separate field under prebuilt benchmark that is just called Custom that starts with nothing and makes it super easy to create one's custom benchmark

@Muennighoff Muennighoff added good first issue Good for newcomers leaderboard issues related to the leaderboard labels Feb 19, 2025
@x-tabdeveloping
Copy link
Collaborator

I think it's not entirely trivial what the default behaviour should be. What would you expect it to do when all domains are deleted? Should it just then include all tasks in the task list?
I think it makes a lot of sense that the task selection doesn't allow tasks to be selected that are not in the selected domains, though if the domain list is empty, it could potentially make sense to interpret it as an "everything goes" kinda thing.

I was thinking about doing a custom benchmark tab, I think from the user's perspective it would make quite a bit of sense. On the other hand, it might prove a bit technically challenging, since the leaderboard, as it is right now, relies quite a bit on the selected benchmark (to speed things up by a lot).

Can you provide a scenario, where you would be interested in performance on a single task, but don't necessarily know what benchmark that task belongs to? I'm just wondering what the exact use case is here, and then based on that we can figure out a sensible way to do this.

@Muennighoff
Copy link
Contributor Author

I think for sth like the attached I would not expect it to be empty? 🤔 Especially since "jpn" only works but only when adding "zho" it is empty

tmp2.mov

@KennethEnevoldsen
Copy link
Contributor

Hmm yea, this seems odd, we def. have "jpn" tasks in:

[BibleNLPBitextMining(name='BibleNLPBitextMining', languages=['aai', 'aak', 'aau', '...']),
 FloresBitextMining(name='FloresBitextMining', languages=['ace', 'acm', 'acq', '...']),
 NTREXBitextMining(name='NTREXBitextMining', languages=['afr', 'amh', 'arb', '...']),
 TatoebaBitextMining(name='Tatoeba', languages=['afr', 'amh', 'ang', '...']),
 AmazonCounterfactualClassification(name='AmazonCounterfactualClassification', languages=['deu', 'eng', 'jpn']),
 MassiveIntentClassification(name='MassiveIntentClassification', languages=['afr', 'amh', 'ara', '...']),
 SIB200ClusteringFast(name='SIB200ClusteringS2S', languages=['ace', 'acm', 'acq', '...']),
 BelebeleRetrieval(name='BelebeleRetrieval', languages=['acm', 'afr', 'als', '...']),
 PawsXPairClassification(name='PawsXPairClassification', languages=['cmn', 'deu', 'eng', '...']),
 VoyageMMarcoReranking(name='VoyageMMarcoReranking', languages=['jpn']),
 JSICK(name='JSICK', languages=['jpn']),
 MIRACLRetrievalHardNegatives(name='MIRACLRetrievalHardNegatives', languages=['ara', 'ben', 'deu', '...'])]

produces using:

import mteb

bench = mteb.get_benchmark("MTEB(Multilingual, v1)")

tasks = bench.tasks
tasks = [t for t in tasks if "jpn" in t.languages]

@x-tabdeveloping I feel like this would have worked previously. My guess is that this is probably a bug?

@KennethEnevoldsen
Copy link
Contributor

KennethEnevoldsen commented Feb 20, 2025

Hmm, can't reproduce it though:

Image Image

@x-tabdeveloping
Copy link
Collaborator

I can't reproduce it either, it seems to work as intended

@Mateleo
Copy link

Mateleo commented Feb 21, 2025

How to select only one language ?
Nevermind there is MTEB(fra, v1)

@KennethEnevoldsen
Copy link
Contributor

@Mateleo didn't the dropdown selection for language work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers leaderboard issues related to the leaderboard
Projects
None yet
Development

No branches or pull requests

4 participants