LB customization is difficult #2104

Muennighoff · 2025-02-19T20:55:34Z

Some people just care about selecting specific tasks but if I clear out all domains and want to select specific tasks then they will not appear cuz domains are empty I guess? also when adding a domain it will clear out my current task selection 🤔

Maybe it is worth having a separate field under prebuilt benchmark that is just called Custom that starts with nothing and makes it super easy to create one's custom benchmark

x-tabdeveloping · 2025-02-20T08:39:32Z

I think it's not entirely trivial what the default behaviour should be. What would you expect it to do when all domains are deleted? Should it just then include all tasks in the task list?
I think it makes a lot of sense that the task selection doesn't allow tasks to be selected that are not in the selected domains, though if the domain list is empty, it could potentially make sense to interpret it as an "everything goes" kinda thing.

I was thinking about doing a custom benchmark tab, I think from the user's perspective it would make quite a bit of sense. On the other hand, it might prove a bit technically challenging, since the leaderboard, as it is right now, relies quite a bit on the selected benchmark (to speed things up by a lot).

Can you provide a scenario, where you would be interested in performance on a single task, but don't necessarily know what benchmark that task belongs to? I'm just wondering what the exact use case is here, and then based on that we can figure out a sensible way to do this.

Muennighoff · 2025-02-20T18:36:57Z

I think for sth like the attached I would not expect it to be empty? 🤔 Especially since "jpn" only works but only when adding "zho" it is empty

tmp2.mov

KennethEnevoldsen · 2025-02-20T20:49:29Z

Hmm yea, this seems odd, we def. have "jpn" tasks in:

[BibleNLPBitextMining(name='BibleNLPBitextMining', languages=['aai', 'aak', 'aau', '...']),
 FloresBitextMining(name='FloresBitextMining', languages=['ace', 'acm', 'acq', '...']),
 NTREXBitextMining(name='NTREXBitextMining', languages=['afr', 'amh', 'arb', '...']),
 TatoebaBitextMining(name='Tatoeba', languages=['afr', 'amh', 'ang', '...']),
 AmazonCounterfactualClassification(name='AmazonCounterfactualClassification', languages=['deu', 'eng', 'jpn']),
 MassiveIntentClassification(name='MassiveIntentClassification', languages=['afr', 'amh', 'ara', '...']),
 SIB200ClusteringFast(name='SIB200ClusteringS2S', languages=['ace', 'acm', 'acq', '...']),
 BelebeleRetrieval(name='BelebeleRetrieval', languages=['acm', 'afr', 'als', '...']),
 PawsXPairClassification(name='PawsXPairClassification', languages=['cmn', 'deu', 'eng', '...']),
 VoyageMMarcoReranking(name='VoyageMMarcoReranking', languages=['jpn']),
 JSICK(name='JSICK', languages=['jpn']),
 MIRACLRetrievalHardNegatives(name='MIRACLRetrievalHardNegatives', languages=['ara', 'ben', 'deu', '...'])]

produces using:

import mteb

bench = mteb.get_benchmark("MTEB(Multilingual, v1)")

tasks = bench.tasks
tasks = [t for t in tasks if "jpn" in t.languages]

@x-tabdeveloping I feel like this would have worked previously. My guess is that this is probably a bug?

KennethEnevoldsen · 2025-02-20T20:53:45Z

Hmm, can't reproduce it though:

x-tabdeveloping · 2025-02-21T08:16:32Z

I can't reproduce it either, it seems to work as intended

Mateleo · 2025-02-21T14:12:09Z

~~How to select only one language ?~~
Nevermind there is MTEB(fra, v1)

KennethEnevoldsen · 2025-02-21T14:29:22Z

@Mateleo didn't the dropdown selection for language work?

Muennighoff added good first issue Good for newcomers leaderboard issues related to the leaderboard labels Feb 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LB customization is difficult #2104

LB customization is difficult #2104

Muennighoff commented Feb 19, 2025 •

edited

Loading

x-tabdeveloping commented Feb 20, 2025

Muennighoff commented Feb 20, 2025

KennethEnevoldsen commented Feb 20, 2025

KennethEnevoldsen commented Feb 20, 2025 •

edited

Loading

x-tabdeveloping commented Feb 21, 2025

Mateleo commented Feb 21, 2025 •

edited

Loading

KennethEnevoldsen commented Feb 21, 2025

LB customization is difficult #2104

LB customization is difficult #2104

Comments

Muennighoff commented Feb 19, 2025 • edited Loading

x-tabdeveloping commented Feb 20, 2025

Muennighoff commented Feb 20, 2025

KennethEnevoldsen commented Feb 20, 2025

KennethEnevoldsen commented Feb 20, 2025 • edited Loading

x-tabdeveloping commented Feb 21, 2025

Mateleo commented Feb 21, 2025 • edited Loading

KennethEnevoldsen commented Feb 21, 2025

Muennighoff commented Feb 19, 2025 •

edited

Loading

KennethEnevoldsen commented Feb 20, 2025 •

edited

Loading

Mateleo commented Feb 21, 2025 •

edited

Loading