Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 12 additions & 10 deletions mteb/results/task_result.py
Original file line number Diff line number Diff line change
Expand Up @@ -633,21 +633,23 @@ def validate_and_filter_scores(self, task: AbsTask | None = None) -> Self:
task = get_task(self.task_name)

splits = task.eval_splits
hf_subsets = task.hf_subsets
hf_subsets = set(hf_subsets)
hf_subsets = set(task.hf_subsets) # Convert to set once

new_scores = {}
seen_splits = set()
for split in self.scores:
if split not in splits:
continue
new_scores[split] = []
seen_subsets = set()
for _scores in self.scores[split]:
if _scores["hf_subset"] not in hf_subsets:
continue
new_scores[split].append(_scores)
# Use list comprehension for better performance
new_scores[split] = [
_scores
for _scores in self.scores[split]
if _scores["hf_subset"] in hf_subsets
]
for _scores in new_scores[split]:
Comment on lines +644 to +650
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this just does the loop twice - is that really faster?

seen_subsets.add(_scores["hf_subset"])

if seen_subsets != hf_subsets:
missing_subsets = hf_subsets - seen_subsets
if len(missing_subsets) > 2:
Expand All @@ -664,9 +666,9 @@ def validate_and_filter_scores(self, task: AbsTask | None = None) -> Self:
logger.warning(
f"{task.metadata.name}: Missing splits {set(splits) - seen_splits}"
)
new_res = {**self.to_dict(), "scores": new_scores}
new_res = TaskResult.from_validated(**new_res)
return new_res
data = self.model_dump()
data["scores"] = new_scores
return type(self).model_construct(**data)
Comment on lines +670 to +671
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not at all sure that this is faster - we wrote from_validated specifically to speed up data load and avoid validation. I would like like us to double check this PR

Copy link
Member

@Samoed Samoed Dec 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From validated would just use model construct too

return cls.model_construct(**data)

But we can keep from_validated for consistency

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right it is the same - it is only for ModelResult and BenchmarkResult that it is different. So we assume that self.model_dump is faster than to_dict()

Copy link
Member

@Samoed Samoed Dec 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to_dict is model_dump too

def to_dict(self) -> dict:
"""Convert the TaskResult to a dictionary.
Returns:
The TaskResult as a dictionary.
"""
return self.model_dump()

In this part main change dict merging {**self.to_dict(), "scores": new_scores} and just data["scores"] = new_scores

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suspect as much, which is why I am not sure if this PR actually leads to a time saving that is worth the edit (it might, but I am not at all sure and if it does I would love to learn where the time saving is)


def is_mergeable(
self,
Expand Down