Restructure Retrieval benchmarks: rename, clean slugs, add Miscellaneous, and relocate tasks#3212
Conversation
| RTEB_ENGLISH = Benchmark( | ||
| name="RTEB(eng, beta)", | ||
| display_name="RTEB English", | ||
| display_name="English", |
There was a problem hiding this comment.
I think this can be a bit confusing, because we have other language specific datasets
There was a problem hiding this comment.
Oh, I see — two different English buttons are displaying the same content.
After looking into it, I found that the issue comes from this part of the code:
def _create_button(
i: int,
benchmark: Benchmark,
state: gr.State,
label_to_value: dict[str, str],
**kwargs,
):
val = benchmark.name
label = (
benchmark.display_name if benchmark.display_name is not None else benchmark.name
)
label_to_value[label] = benchmark.name
button = gr.Button(
label,
variant="secondary" if i != 0 else "primary",
icon=benchmark.icon,
key=f"{i}_button_{val}",
elem_classes="text-white",
**kwargs,
)Since label_to_value is a dict, assigning with the same key will overwrite the previous value whenever two buttons share the same label.
So...I could not use the same display_name in the Benchmark 😥
There was a problem hiding this comment.
I think we should add test for this
There was a problem hiding this comment.
I think this can be a bit confusing, because we have other language specific datasets
Good point; I can see arguments for either direction. @KennethEnevoldsen what do you think?
There was a problem hiding this comment.
Hmm, so visually, I don't see a big issue with having two "English" benchmarks, as we have the structure (especially with the change below)
Whether this is possible in the code, I'm not sure, though. I will have to run a test on that Monday.
KennethEnevoldsen
left a comment
There was a problem hiding this comment.
(will get back to this Monday to test the buttons)
| RTEB_BENCHMARK_ENTRIES = [ | ||
| MenuEntry( | ||
| name="RTEB (Retrieval)", | ||
| name="Retrieval", |
There was a problem hiding this comment.
We might need to rename "select benchmark" to "General Purpose"
| RTEB_ENGLISH = Benchmark( | ||
| name="RTEB(eng, beta)", | ||
| display_name="RTEB English", | ||
| display_name="English", |
There was a problem hiding this comment.
Hmm, so visually, I don't see a big issue with having two "English" benchmarks, as we have the structure (especially with the change below)
Whether this is possible in the code, I'm not sure, though. I will have to run a test on that Monday.
|
I added blank after display_name to prevent duplicate labels from appearing. |
|
Since I can't update this branch, I made a separate PR here #3222 @q275343119, can you review it to see if it works for you? |
It works for me. |
|
great I will close this then in favor of the other PR |
This PR makes the following adjustments to the Retrieval benchmark structure:
Preview space: https://huggingface.co/spaces/SmileXing/leaderboard