Restructure Retrieval benchmarks: rename, clean slugs, add Miscellaneous, and relocate tasks by q275343119 · Pull Request #3212 · embeddings-benchmark/mteb

q275343119 · 2025-09-25T10:01:03Z

This PR makes the following adjustments to the Retrieval benchmark structure:

Rename “RTEB (Retrieval)” to “Retrieval”.
Remove the “RTEB” slug from sub-benchmarks (e.g., change “RTEB Finance” → “Finance”).
Add a new Miscellaneous category under Retrieval.
Move the following benchmarks from MTEB/Miscellaneous to Retrieval/Miscellaneous:
- BEIR
- NanoBEIR
- BRIGHT
- BRIGHT (long)
- Codenformation Retrieval
- Instruction Following
- Long-context Retrieval
- Reasoning Retrieval

Preview space: https://huggingface.co/spaces/SmileXing/leaderboard

Samoed · 2025-09-25T10:10:26Z

mteb/benchmarks/benchmarks/rteb_benchmarks.py

 RTEB_ENGLISH = Benchmark(
    name="RTEB(eng, beta)",
-    display_name="RTEB English",
+    display_name="English",


I think this can be a bit confusing, because we have other language specific datasets

Oh, I see — two different English buttons are displaying the same content.
After looking into it, I found that the issue comes from this part of the code:

def _create_button( i: int, benchmark: Benchmark, state: gr.State, label_to_value: dict[str, str], **kwargs, ): val = benchmark.name label = ( benchmark.display_name if benchmark.display_name is not None else benchmark.name ) label_to_value[label] = benchmark.name button = gr.Button( label, variant="secondary" if i != 0 else "primary", icon=benchmark.icon, key=f"{i}_button_{val}", elem_classes="text-white", **kwargs, )

Since label_to_value is a dict, assigning with the same key will overwrite the previous value whenever two buttons share the same label.

So...I could not use the same display_name in the Benchmark 😥

I think we should add test for this

I think this can be a bit confusing, because we have other language specific datasets

Good point; I can see arguments for either direction. @KennethEnevoldsen what do you think?

Hmm, so visually, I don't see a big issue with having two "English" benchmarks, as we have the structure (especially with the change below)

Whether this is possible in the code, I'm not sure, though. I will have to run a test on that Monday.

KennethEnevoldsen

(will get back to this Monday to test the buttons)

KennethEnevoldsen · 2025-09-27T15:16:38Z

mteb/leaderboard/benchmark_selector.py

 RTEB_BENCHMARK_ENTRIES = [
    MenuEntry(
-        name="RTEB (Retrieval)",
+        name="Retrieval",


We might need to rename "select benchmark" to "General Purpose"

KennethEnevoldsen · 2025-09-27T15:20:23Z

mteb/benchmarks/benchmarks/rteb_benchmarks.py

 RTEB_ENGLISH = Benchmark(
    name="RTEB(eng, beta)",
-    display_name="RTEB English",
+    display_name="English",


Hmm, so visually, I don't see a big issue with having two "English" benchmarks, as we have the structure (especially with the change below)

Whether this is possible in the code, I'm not sure, though. I will have to run a test on that Monday.

q275343119 · 2025-09-29T06:42:17Z

I added blank after display_name to prevent duplicate labels from appearing.

KennethEnevoldsen · 2025-09-29T14:41:31Z

Since I can't update this branch, I made a separate PR here #3222 @q275343119, can you review it to see if it works for you?

q275343119 · 2025-09-29T15:00:15Z

Since I can't update this branch, I made a separate PR here #3222 @q275343119, can you review it to see if it works for you?

It works for me.
Adding the suffix is a nice idea— no problem on my side.

KennethEnevoldsen · 2025-09-29T15:33:28Z

great I will close this then in favor of the other PR

feat - adjust Rteb's Benchmark

5922959

Samoed reviewed Sep 25, 2025

View reviewed changes

KennethEnevoldsen reviewed Sep 27, 2025

View reviewed changes

feat - add blank

f4b86d2

KennethEnevoldsen mentioned this pull request Sep 29, 2025

fix: Update UI to better allow for RTEB #3222

Merged

KennethEnevoldsen closed this Sep 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restructure Retrieval benchmarks: rename, clean slugs, add Miscellaneous, and relocate tasks#3212

Restructure Retrieval benchmarks: rename, clean slugs, add Miscellaneous, and relocate tasks#3212
q275343119 wants to merge 2 commits intoembeddings-benchmark:mainfrom
embedding-benchmark:feat-rteb-sider-ui

q275343119 commented Sep 25, 2025

Uh oh!

Samoed Sep 25, 2025

Uh oh!

q275343119 Sep 25, 2025

Uh oh!

Samoed Sep 25, 2025

Uh oh!

fzliu Sep 25, 2025

Uh oh!

KennethEnevoldsen Sep 27, 2025

Uh oh!

KennethEnevoldsen left a comment

Uh oh!

KennethEnevoldsen Sep 27, 2025

Uh oh!

KennethEnevoldsen Sep 27, 2025

Uh oh!

q275343119 commented Sep 29, 2025

Uh oh!

KennethEnevoldsen commented Sep 29, 2025

Uh oh!

q275343119 commented Sep 29, 2025

Uh oh!

KennethEnevoldsen commented Sep 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

q275343119 commented Sep 25, 2025

Uh oh!

Samoed Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

q275343119 Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

Samoed Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

fzliu Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

KennethEnevoldsen Sep 27, 2025

Choose a reason for hiding this comment

Uh oh!

KennethEnevoldsen left a comment

Choose a reason for hiding this comment

Uh oh!

KennethEnevoldsen Sep 27, 2025

Choose a reason for hiding this comment

Uh oh!

KennethEnevoldsen Sep 27, 2025

Choose a reason for hiding this comment

Uh oh!

q275343119 commented Sep 29, 2025

Uh oh!

KennethEnevoldsen commented Sep 29, 2025

Uh oh!

q275343119 commented Sep 29, 2025

Uh oh!

KennethEnevoldsen commented Sep 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants