Conversation
KennethEnevoldsen
left a comment
There was a problem hiding this comment.
Looks good - we might discuss what are relevant image metrics, but the structue change it very decent
| return ImageStatistics( | ||
| min_image_width=min(img_widths), | ||
| average_image_width=sum(img_widths) / len(img_widths), | ||
| max_image_width=max(img_widths), | ||
| min_image_height=min(img_heights), | ||
| average_image_height=sum(img_heights) / len(img_heights), | ||
| max_image_height=max(img_heights), | ||
| ) |
There was a problem hiding this comment.
I am not sure if these are the statistics that people would normally be interested in?
Should we add duplicates as well?
There was a problem hiding this comment.
I've added it as it was previously in MIEB. I think we can expand it, but I'm not sure what to add. CC @isaac-chung
There was a problem hiding this comment.
I would like duplicates - otherwise, I don't think I have a lot.
There was a problem hiding this comment.
What was previously in MIEB is fine
KennethEnevoldsen
left a comment
There was a problem hiding this comment.
Looks good - we still need to resolve the conflicts though. I think we can add image metrics in another PR if needed
@isaac-chung any image metrics that we need to add?
Can't think of any extra from the MIEB existing ones. Think this PR just needs to resolve the conflicts and that's it. |
# Conflicts: # mteb/abstasks/AbsTaskAnyClassification.py # mteb/abstasks/AbsTaskClustering.py # mteb/abstasks/AbsTaskPairClassification.py # tests/test_benchmark/mock_tasks.py
Continue ideas from #2537. Make all statistics in format
{ "text_statistics": {"min_text_length": 1, ... }, "label_statistics": {"min_labels_per_text": 1, ...}, }For now, I haven't changed MIEB statistics