Skip to content

remove results of model with missing implementations in MTEB#362

Merged
KennethEnevoldsen merged 6 commits intoembeddings-benchmark:mainfrom
ayush1298:remove_models_results
Dec 13, 2025
Merged

remove results of model with missing implementations in MTEB#362
KennethEnevoldsen merged 6 commits intoembeddings-benchmark:mainfrom
ayush1298:remove_models_results

Conversation

@ayush1298
Copy link
Contributor

@ayush1298 ayush1298 commented Dec 12, 2025

This PR is related to removing models having no implementation in MTEB.
Related Github issue: embeddings-benchmark/mteb#3604

Checklist

  • My model has a model sheet, report, or similar
  • My model has a reference implementation in mteb/models/model_implementations/, this can be as an API. Instruction on how to add a model can be found here
    • No, but there is an existing PR ___
  • The results submitted are obtained using the reference implementation
  • My model is available, either as a publicly accessible API or publicly on e.g., Huggingface
  • I solemnly swear that for all results submitted I have not trained on the evaluation dataset including training splits. If I have, I have disclosed it clearly.

Copilot AI review requested due to automatic review settings December 12, 2025 13:21
@ayush1298 ayush1298 changed the title added scripts and csv of missing models remove results of model with missing implementations in MTEB Dec 12, 2025
@Samoed Samoed review requested due to automatic review settings December 12, 2025 13:23
Copilot AI review requested due to automatic review settings December 12, 2025 13:25
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces tooling to identify and remove models from the results directory that lack corresponding implementations in the mteb package. The changes include a script to generate a CSV of missing models, the resulting CSV file with 502 entries, and a script to delete those model directories after user confirmation.

Key Changes:

  • Added identification script that compares results directory against mteb model implementations
  • Generated CSV listing 502 models without implementations
  • Added interactive deletion script with confirmation prompt

Reviewed changes

Copilot reviewed 246 out of 10003 changed files in this pull request and generated 7 comments.

File Description
remove_model_without_implementations.py Script that scans the results directory, identifies models without mteb implementations, and exports findings to CSV
scripts/remove_missing_models.py Interactive script that reads the CSV and deletes model directories after user confirmation
missing_implementations.csv CSV file containing 502 model entries without implementations, including model names and their revisions

Critical Issues Found:

  • Path resolution issues in scripts/remove_missing_models.py - the script expects the CSV and results directory in the wrong locations relative to the scripts folder
  • Invalid entries in the CSV file including a Python script filename (rename_and_move_over.py) and an algorithm identifier (bm25) that should be filtered out
  • Missing directory validation in the model scanning logic that could cause errors when non-directory files are encountered
  • The revisions list incorrectly includes .gitkeep files as revision identifiers

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@ayush1298
Copy link
Contributor Author

@Samoed Can you now review this one?
I am seeing suggestions of copilot, will remove rename_and_move_over.py from csv.

@Samoed
Copy link
Member

Samoed commented Dec 12, 2025

I've found that we don't have implementation for:

BAAI/bge-large-en-v1.5-instruct, intfloat/e5-mistral-7b-instruct-noinstruct, sentence-transformers/all-mpnet-base-v2-instruct can be deleted, because they were added before #34

Copy link
Member

@Samoed Samoed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scripts look fine, but we need to think about what to do with some missing implementations

@ayush1298
Copy link
Contributor Author

what to do with some missing implementations

I think we can open an issue for them to add those models

@KennethEnevoldsen
Copy link
Contributor

Yeah let us 1) not delete those, 2) add an issue on each of the models

@Samoed
Copy link
Member

Samoed commented Dec 12, 2025

Added issues for them, but it seems that google/Gemma-Embeddings-v0.8 was deleted on HF, so we can delete it's results too

@ayush1298
Copy link
Contributor Author

Added issues for them, but it seems that google/Gemma-Embeddings-v0.8 was deleted on HF, so we can delete it's results too

Is these implementation of google/Gemma-Embeddings-v0.8 is different: https://huggingface.co/pipawo1881/Gemma-Embeddings-v0.8
Should I remove below from deletion, and then we can merge them? Or are there any other models that we need to add?

  1. jinaai/jina-clip-v2
  2. sentence-transformers/static-retrieval-mrl-en-v1
  3. sentence-transformers__multi-qa-mpnet-base-dot-v1
  4. tanmaylaud/ret-phi2-v0
  5. lightbird-ai/nomic
  6. technicolor/Angle_BERT

@github-actions
Copy link

Model Results Comparison

No new model results found in this PR.

@ayush1298
Copy link
Contributor Author

@Samoed So, can we merge this, and then in new PR, we will address rest of things in this comment

@Samoed
Copy link
Member

Samoed commented Dec 13, 2025

tanmaylaud/ret-phi2-v0
lightbird-ai/nomic
technicolor/Angle_BERT

I don't think we need them. Kenneth commit was reffered to only sentence transformers model

@ayush1298
Copy link
Contributor Author

tanmaylaud/ret-phi2-v0
lightbird-ai/nomic
technicolor/Angle_BERT

I don't think we need them. Kenneth commit was reffered to only sentence transformers model

okay, I will delete rest of them

@Samoed
Copy link
Member

Samoed commented Dec 13, 2025

Is these implementation of google/Gemma-Embeddings-v0.8 is different: https://huggingface.co/pipawo1881/Gemma-Embeddings-v0.8

Probably yea, in #59 model was loaded from google org and we don't know what is different with this model

@ayush1298
Copy link
Contributor Author

tanmaylaud/ret-phi2-v0
lightbird-ai/nomic
technicolor/Angle_BERT

I don't think we need them. Kenneth commit was reffered to only sentence transformers model

@Samoed I have removed them

Copy link
Member

@Samoed Samoed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good you will need to remove scripts and csv before merge

@ayush1298
Copy link
Contributor Author

I think this looks good you will need to remove scripts and csv before merge

done

@KennethEnevoldsen KennethEnevoldsen merged commit 198c98f into embeddings-benchmark:main Dec 13, 2025
2 of 3 checks passed
@KennethEnevoldsen
Copy link
Contributor

Test fail is expected - merging!

@ayush1298 ayush1298 deleted the remove_models_results branch December 13, 2025 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants