remove results of model with missing implementations in MTEB#362
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces tooling to identify and remove models from the results directory that lack corresponding implementations in the mteb package. The changes include a script to generate a CSV of missing models, the resulting CSV file with 502 entries, and a script to delete those model directories after user confirmation.
Key Changes:
- Added identification script that compares results directory against mteb model implementations
- Generated CSV listing 502 models without implementations
- Added interactive deletion script with confirmation prompt
Reviewed changes
Copilot reviewed 246 out of 10003 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
remove_model_without_implementations.py |
Script that scans the results directory, identifies models without mteb implementations, and exports findings to CSV |
scripts/remove_missing_models.py |
Interactive script that reads the CSV and deletes model directories after user confirmation |
missing_implementations.csv |
CSV file containing 502 model entries without implementations, including model names and their revisions |
Critical Issues Found:
- Path resolution issues in
scripts/remove_missing_models.py- the script expects the CSV and results directory in the wrong locations relative to the scripts folder - Invalid entries in the CSV file including a Python script filename (
rename_and_move_over.py) and an algorithm identifier (bm25) that should be filtered out - Missing directory validation in the model scanning logic that could cause errors when non-directory files are encountered
- The revisions list incorrectly includes
.gitkeepfiles as revision identifiers
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@Samoed Can you now review this one? |
|
I've found that we don't have implementation for:
|
Samoed
left a comment
There was a problem hiding this comment.
Scripts look fine, but we need to think about what to do with some missing implementations
I think we can open an issue for them to add those models |
|
Yeah let us 1) not delete those, 2) add an issue on each of the models |
|
Added issues for them, but it seems that |
Is these implementation of
|
Model Results ComparisonNo new model results found in this PR. |
I don't think we need them. Kenneth commit was reffered to only sentence transformers model |
okay, I will delete rest of them |
Probably yea, in #59 model was loaded from google org and we don't know what is different with this model |
@Samoed I have removed them |
Samoed
left a comment
There was a problem hiding this comment.
I think this looks good you will need to remove scripts and csv before merge
done |
198c98f
into
embeddings-benchmark:main
|
Test fail is expected - merging! |
This PR is related to removing models having no implementation in MTEB.
Related Github issue: embeddings-benchmark/mteb#3604
Checklist
mteb/models/model_implementations/, this can be as an API. Instruction on how to add a model can be found here