[v2] Refactor descriptive stats by Samoed · Pull Request #2823 · embeddings-benchmark/mteb

Samoed · 2025-06-15T20:50:56Z

Continue ideas from #2537. Make all statistics in format

{
    "text_statistics": {"min_text_length": 1, ... },
    "label_statistics": {"min_labels_per_text": 1, ...},
}

For now, I haven't changed MIEB statistics

KennethEnevoldsen

Looks good - we might discuss what are relevant image metrics, but the structue change it very decent

mteb/abstasks/statistics_calculation.py

KennethEnevoldsen · 2025-06-16T18:28:29Z

mteb/abstasks/statistics_calculation.py

+    return ImageStatistics(
+        min_image_width=min(img_widths),
+        average_image_width=sum(img_widths) / len(img_widths),
+        max_image_width=max(img_widths),
+        min_image_height=min(img_heights),
+        average_image_height=sum(img_heights) / len(img_heights),
+        max_image_height=max(img_heights),
+    )


I am not sure if these are the statistics that people would normally be interested in?

Should we add duplicates as well?

I've added it as it was previously in MIEB. I think we can expand it, but I'm not sure what to add. CC @isaac-chung

I would like duplicates - otherwise, I don't think I have a lot.

What was previously in MIEB is fine

KennethEnevoldsen

Looks good - we still need to resolve the conflicts though. I think we can add image metrics in another PR if needed

@isaac-chung any image metrics that we need to add?

isaac-chung · 2025-07-09T19:41:07Z

Looks good - we still need to resolve the conflicts though. I think we can add image metrics in another PR if needed

@isaac-chung any image metrics that we need to add?

Can't think of any extra from the MIEB existing ones. Think this PR just needs to resolve the conflicts and that's it.

# Conflicts: # mteb/abstasks/AbsTaskAnyClassification.py # mteb/abstasks/AbsTaskClustering.py # mteb/abstasks/AbsTaskPairClassification.py # tests/test_benchmark/mock_tasks.py

Samoed added 2 commits June 15, 2025 20:42

start adding

b22a8e5

standardize statistics

2b710fe

Samoed added the v2 label Jun 15, 2025

remove irrelevant file

0141f10

Samoed requested a review from KennethEnevoldsen June 15, 2025 20:58

KennethEnevoldsen reviewed Jun 16, 2025

View reviewed changes

Samoed added 2 commits July 5, 2025 18:16

Merge branch 'v2.0.0' into refactor_descriptive_stats

6da38d5

update retrieval calculation

c7e436a

Samoed marked this pull request as ready for review July 5, 2025 15:42

Samoed requested review from KennethEnevoldsen and isaac-chung July 5, 2025 15:42

KennethEnevoldsen approved these changes Jul 8, 2025

View reviewed changes

Samoed added 4 commits July 9, 2025 23:03

Merge branch 'v2.0.0' into refactor_descriptive_stats

0708493

# Conflicts: # mteb/abstasks/AbsTaskAnyClassification.py # mteb/abstasks/AbsTaskClustering.py # mteb/abstasks/AbsTaskPairClassification.py # tests/test_benchmark/mock_tasks.py

update zeroshot statistics

506d497

fix random

4600694

Merge branch 'v2.0.0' into refactor_descriptive_stats

01669fc

Samoed merged commit 29a9228 into v2.0.0 Jul 12, 2025
8 checks passed

Samoed deleted the refactor_descriptive_stats branch July 12, 2025 17:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[v2] Refactor descriptive stats#2823

[v2] Refactor descriptive stats#2823
Samoed merged 9 commits intov2.0.0from
refactor_descriptive_stats

Samoed commented Jun 15, 2025 •

edited

Loading

Uh oh!

KennethEnevoldsen left a comment

Uh oh!

Uh oh!

KennethEnevoldsen Jun 16, 2025

Uh oh!

Samoed Jun 16, 2025

Uh oh!

KennethEnevoldsen Jun 16, 2025

Uh oh!

isaac-chung Jul 8, 2025

Uh oh!

KennethEnevoldsen left a comment

Uh oh!

isaac-chung commented Jul 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Samoed commented Jun 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KennethEnevoldsen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

KennethEnevoldsen Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

Samoed Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

KennethEnevoldsen Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

isaac-chung Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

KennethEnevoldsen left a comment

Choose a reason for hiding this comment

Uh oh!

isaac-chung commented Jul 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Samoed commented Jun 15, 2025 •

edited

Loading