Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
375 commits
Select commit Hold shift + click to select a range
537b974
Update tasks table
github-actions[bot] Nov 6, 2024
0c7c216
1.19.0
invalid-email-address Nov 6, 2024
b1a0ec6
fix: Add the_ugly_duckling.txt for speedtask to Python wheel (#1402)
helena-intel Nov 7, 2024
a85c550
1.19.1
invalid-email-address Nov 7, 2024
fd8b283
fix: Added the necessary trust_remote_code (#1406)
AlexeyVatolin Nov 7, 2024
7438cfa
1.19.2
invalid-email-address Nov 7, 2024
fccf034
docs: Update recommendation for pushing results (#1401)
KennethEnevoldsen Nov 9, 2024
9681eb3
docs: Fix a typo in README (#1430)
eherra Nov 11, 2024
cc7a106
fix: add logging for RetrievalEvaluator NaN values for similarity sco…
KennethEnevoldsen Nov 11, 2024
8efb4e0
1.19.3
invalid-email-address Nov 11, 2024
7f1a1d3
fix: make samples_per_label a task attribute (#1419)
isaac-chung Nov 11, 2024
f79d9ba
fix: Add Korean AutoRAGRetrieval (#1388)
yjoonjang Nov 11, 2024
a240ea0
fix: Add missing benchmarks in benchmarks.py (#1431)
KennethEnevoldsen Nov 11, 2024
d069aba
Update tasks table
github-actions[bot] Nov 11, 2024
19aefa3
1.19.4
invalid-email-address Nov 11, 2024
76c2112
Leaderboard 2.0: added performance x n_parameters plot + more benchma…
x-tabdeveloping Nov 12, 2024
3a1a470
Leaderboard: Fixed code benchmarks (#1441)
x-tabdeveloping Nov 13, 2024
dd5d226
fix: Count unique texts, data leaks in calculate metrics (#1438)
Samoed Nov 14, 2024
04ac3f2
fix: update task metadata to allow for null (#1448)
KennethEnevoldsen Nov 14, 2024
f6a49fe
Update tasks table
github-actions[bot] Nov 14, 2024
78c0e4e
1.19.5
invalid-email-address Nov 14, 2024
4e86cea
Fix: Made data parsing in the leaderboard figure more robust (#1450)
x-tabdeveloping Nov 14, 2024
039d010
Fixed task loading (#1451)
x-tabdeveloping Nov 14, 2024
feb1ab7
fix: publish (#1452)
x-tabdeveloping Nov 14, 2024
3397633
1.19.6
invalid-email-address Nov 14, 2024
14d7523
fix: Fix load external results with `None` mteb_version (#1453)
Samoed Nov 14, 2024
68eb498
1.19.7
invalid-email-address Nov 14, 2024
58c459b
WIP: Polishing up leaderboard UI (#1461)
x-tabdeveloping Nov 15, 2024
1b920ac
fix: loading pre 1.11.0 (#1460)
Samoed Nov 15, 2024
a988fef
1.19.8
invalid-email-address Nov 15, 2024
9b2aece
fix: swap touche2020 to maintain compatibility (#1469)
isaac-chung Nov 17, 2024
8bb4a29
1.19.9
invalid-email-address Nov 17, 2024
2fb6fe7
docs: Add sum per language for task counts (#1468)
isaac-chung Nov 18, 2024
fde124a
fix: pinned datasets to <3.0.0 (#1470)
Napuh Nov 19, 2024
7186e04
1.19.10
invalid-email-address Nov 19, 2024
1cc6c9e
feat: add CUREv1 retrieval dataset (#1459)
dbuades Nov 21, 2024
4408717
Update tasks table
github-actions[bot] Nov 21, 2024
3ff38ec
1.20.0
invalid-email-address Nov 21, 2024
917ad7f
fix: check if `model` attr of model exists (#1499)
Samoed Nov 26, 2024
cde720e
1.20.1
invalid-email-address Nov 26, 2024
0affa31
fix: Leaderboard demo data loading (#1507)
x-tabdeveloping Nov 27, 2024
594f643
1.20.2
invalid-email-address Nov 27, 2024
35245d3
fix: leaderboard only shows models that have ModelMeta (#1508)
x-tabdeveloping Nov 27, 2024
9282796
1.20.3
invalid-email-address Nov 27, 2024
942f212
fix: align readme with current mteb (#1493)
Samoed Nov 27, 2024
09f004c
1.20.4
invalid-email-address Nov 27, 2024
cfd43ac
docs: Add lang family mapping and map to task table (#1486)
isaac-chung Nov 28, 2024
377a63d
Update tasks table
github-actions[bot] Nov 28, 2024
e3d2b54
fix: Ensure that models match the names on embedding-benchmarks/resul…
KennethEnevoldsen Nov 29, 2024
9980c60
1.20.5
invalid-email-address Nov 29, 2024
b02ae82
fix: Adding missing metadata on models and mathcing names up with the…
x-tabdeveloping Nov 29, 2024
ba09b11
1.20.6
invalid-email-address Nov 29, 2024
8e12250
feat: Evaluate missing splits (#1525)
isaac-chung Nov 29, 2024
ee1edac
1.21.0
invalid-email-address Nov 29, 2024
343b6e0
fix: Correct typos superseeded -> superseded (#1532)
isaac-chung Nov 30, 2024
e949d2a
1.21.1
invalid-email-address Nov 30, 2024
5b6f20f
fix: Task load data error for SICK-BR-STS and XStance (#1534)
isaac-chung Dec 1, 2024
ec9413a
1.21.2
invalid-email-address Dec 1, 2024
39349ff
fix: Proprietary models now get correctly shown in leaderboard (#1530)
x-tabdeveloping Dec 2, 2024
d07c29b
1.21.3
invalid-email-address Dec 2, 2024
5fa7b7b
docs: Add Model Meta parameters and metadata (#1536)
isaac-chung Dec 2, 2024
36bab4d
fix: add more model meta (jina, e5) (#1537)
isaac-chung Dec 4, 2024
ac4a706
1.21.4
invalid-email-address Dec 4, 2024
c2f4c26
Add cohere models (#1538)
KennethEnevoldsen Dec 4, 2024
5013df8
fix: add nomic models (#1543)
KennethEnevoldsen Dec 4, 2024
97ab272
fix: Added all-minilm-l12-v2 (#1542)
KennethEnevoldsen Dec 4, 2024
df11c38
fix: Added arctic models (#1541)
KennethEnevoldsen Dec 4, 2024
37fdfa1
fix: add sentence trimming to OpenAIWrapper (#1526)
yjoonjang Dec 4, 2024
1e62184
1.21.5
invalid-email-address Dec 4, 2024
a44a46c
fix: Fixed metadata errors (#1547)
x-tabdeveloping Dec 4, 2024
d713525
1.21.6
invalid-email-address Dec 4, 2024
279a4ee
fix: remove curev1 from multlingual (#1552)
KennethEnevoldsen Dec 5, 2024
e339735
1.21.7
invalid-email-address Dec 5, 2024
2ee8d44
fix: Add Model2vec (#1546)
x-tabdeveloping Dec 6, 2024
2905813
Made result loading more permissive, changed eval splits for HotPotQA…
x-tabdeveloping Dec 6, 2024
a6ce6f9
1.21.8
invalid-email-address Dec 6, 2024
fc64791
docs: Correction of SICK-R metadata (#1558)
rafalposwiata Dec 7, 2024
611b6a1
feat(google_models): fix issues and add support for `text-embedding-0…
dbuades Dec 7, 2024
5e7e033
1.22.0
invalid-email-address Dec 7, 2024
ac44e58
fix(bm25s): search implementation (#1566)
dbuades Dec 7, 2024
b8ff89c
1.22.1
invalid-email-address Dec 7, 2024
03347eb
docs: Fix dependency library name for bm25s (#1568)
isaac-chung Dec 7, 2024
6489fca
fix: Add training dataset to model meta (#1561)
KennethEnevoldsen Dec 8, 2024
1d21818
feat: (cohere_models) cohere_task_type issue, batch requests and tqdm…
dbuades Dec 8, 2024
68bd8ac
fix(publichealth-qa): ignore rows with `None` values in `question` o…
dbuades Dec 8, 2024
2550a27
1.23.0
invalid-email-address Dec 8, 2024
ce8c175
fix: Added metadata for miscellaneous models (#1557)
x-tabdeveloping Dec 9, 2024
f9ede12
1.23.1
invalid-email-address Dec 9, 2024
c49f838
fix: Added radar chart displaying capabilities on task types (#1570)
x-tabdeveloping Dec 9, 2024
e605c7b
1.23.2
invalid-email-address Dec 9, 2024
53756ad
feat: add new arctic v2.0 models (#1574)
dbuades Dec 10, 2024
27f7d8c
1.24.0
invalid-email-address Dec 10, 2024
7b9b3c9
fix: Add namaa MrTydi reranking dataset (#1573)
omarelshehy Dec 11, 2024
1101db7
Update tasks table
github-actions[bot] Dec 11, 2024
9c0b208
1.24.1
invalid-email-address Dec 11, 2024
373db74
fix: Eval langs not correctly passed to monolingual tasks (#1587)
Samoed Dec 13, 2024
eecc9f1
1.24.2
invalid-email-address Dec 13, 2024
fdfdaef
feat: Add ColBert (#1563)
sam-hey Dec 14, 2024
b466051
1.25.0
invalid-email-address Dec 14, 2024
992b20b
doc: colbert add score_function & doc section (#1592)
sam-hey Dec 15, 2024
8e6ee46
Feat: add support for scoring function (#1594)
Samoed Dec 15, 2024
95d5ae5
Add new models nvidia, gte, linq (#1436)
AlexeyVatolin Dec 16, 2024
0c9e046
Leaderboard: Refined plots (#1601)
x-tabdeveloping Dec 16, 2024
6ecc86f
fix: Leaderboard refinements (#1603)
x-tabdeveloping Dec 16, 2024
5e9c468
1.25.1
invalid-email-address Dec 16, 2024
b81b584
Feat: Use similarity scores if available (#1602)
Samoed Dec 16, 2024
6731b94
Add NanoBEIR Datasets (#1588)
KGupta10 Dec 18, 2024
9de7f20
Update tasks table
github-actions[bot] Dec 18, 2024
48cb97d
Feat: Evaluate missing languages (#1584)
Samoed Dec 18, 2024
ad05983
Add IBM Granite Embedding Models (#1613)
aashka-trivedi Dec 19, 2024
7c8e094
fix: disable co2_tracker for API models (#1614)
dbuades Dec 20, 2024
d8c015f
1.25.2
invalid-email-address Dec 20, 2024
0c44482
fix: set `use_instructions` to True in models using prompts (#1616)
dbuades Dec 20, 2024
2024338
1.25.3
invalid-email-address Dec 20, 2024
272adb1
fix: override existing results (#1617)
Samoed Dec 22, 2024
bd782d6
1.25.4
invalid-email-address Dec 22, 2024
e1b74f2
add MSMARCO eval split in MTEB English (classic) benchmark (#1620)
KennethEnevoldsen Dec 22, 2024
748033e
fix: GermanDPR Dataset Causes Cross-Encoder Failure Due to Unexpected…
KennethEnevoldsen Dec 22, 2024
72a457e
fix: properly add mteb_model_meta to model object (#1623)
KennethEnevoldsen Dec 22, 2024
d8dd96c
1.25.5
invalid-email-address Dec 22, 2024
ef5a068
Feat: Add jasper (#1591)
Samoed Dec 23, 2024
02ae4fa
fix: Update results_to_dataframe to use BenchmarkResults class (#1628)
AlexeyVatolin Dec 24, 2024
e8e1a50
1.25.6
invalid-email-address Dec 24, 2024
1b06601
Speed up test_save_predictions (#1631)
AlexeyVatolin Dec 25, 2024
2de61b1
fix: Correction of discrepancies for gte-Qweb model (#1637)
AlexeyVatolin Dec 29, 2024
eb643a7
1.25.7
invalid-email-address Dec 29, 2024
366b2ce
fix: output_folder for co2 evaluation (#1642)
Muennighoff Dec 30, 2024
815a1b4
1.25.8
invalid-email-address Dec 30, 2024
27eb549
fix: add missing benchmark to benchmarks.py (#1641)
gowitheflow-1998 Dec 30, 2024
e4edb66
1.25.9
invalid-email-address Dec 30, 2024
fa0ed6b
fix: Cast all Model2Vec outputs as floats (#1667)
isaac-chung Jan 1, 2025
80efdb6
1.25.10
invalid-email-address Jan 1, 2025
19cbf64
fix: Update gritlm kwargs (#1643)
Muennighoff Jan 1, 2025
3d2fbf0
1.25.11
invalid-email-address Jan 1, 2025
663653e
fix: Use batch size kwargs for openai APIs (#1668)
KennethEnevoldsen Jan 1, 2025
65ef2a1
1.25.12
invalid-email-address Jan 1, 2025
f426159
fix: Pass trust_remote_code=True to CPM model (#1669)
KennethEnevoldsen Jan 1, 2025
82b6cce
1.25.13
invalid-email-address Jan 1, 2025
f99a178
fix: Updated metadata for CPM (#1670)
KennethEnevoldsen Jan 1, 2025
84f3f41
1.25.14
invalid-email-address Jan 1, 2025
5cfcc77
fix: remove model as a parameter for MulticlassClassification (#1666)
Samoed Jan 1, 2025
82e9949
fix: Use prompts instead of prompt names for voyage (#1665)
Samoed Jan 1, 2025
cf1f2d4
1.25.15
invalid-email-address Jan 1, 2025
343edc4
fix: Update BUCC dataset revision (#1674)
Muennighoff Jan 1, 2025
05c91ed
1.25.16
invalid-email-address Jan 1, 2025
c50f26c
fix: Add warning for non-retrieval tasks when using bm25s (#1678)
isaac-chung Jan 1, 2025
5bf74fc
1.25.17
invalid-email-address Jan 1, 2025
1aa08fd
fix: add check for key error in loader (#1675)
Samoed Jan 2, 2025
c8de079
1.25.18
invalid-email-address Jan 2, 2025
7b1e67b
fix: trust remote code for snowflake-arctic-embed-m-v2.0 (#1682)
Muennighoff Jan 2, 2025
8d3c917
1.25.19
invalid-email-address Jan 2, 2025
f5e6401
fix: nomic tensor return (#1683)
Samoed Jan 2, 2025
525777a
1.25.20
invalid-email-address Jan 2, 2025
ba1f022
feat: add `avsolatorio/NoInstruct-small-Embedding-v0` (#1677)
Samoed Jan 2, 2025
4a496b9
fix: arg name for openbmb/MiniCPM-Embedding (#1691)
Samoed Jan 2, 2025
7dbafab
1.26.0
invalid-email-address Jan 2, 2025
f4de307
fix: add trust_remote_code to Snowflake/snowflake-arctic-embed-m-lon…
Muennighoff Jan 3, 2025
6bfc1f2
fix: add revision for jinaai/jina-embeddings-v2-small-en (#1692)
Samoed Jan 3, 2025
d36498a
1.26.1
invalid-email-address Jan 3, 2025
43d74e1
fix: update model loader to trust remote code (#1697)
isaac-chung Jan 3, 2025
5447b5d
1.26.2
invalid-email-address Jan 3, 2025
808257c
fix: nomic prompts (#1685)
Samoed Jan 3, 2025
cff7ed8
fix: NanoBeir (#1687)
Samoed Jan 3, 2025
c0f4394
1.26.3
invalid-email-address Jan 3, 2025
0753aba
Update RerankingEvaluator.py (#1702)
Muennighoff Jan 4, 2025
6d1d9f4
fix: Register MicroLlama Text Embedding (#1644)
keeeeenw Jan 4, 2025
753d08a
fix: GermanDPR (#1703)
Samoed Jan 4, 2025
4a1c8e6
1.26.4
invalid-email-address Jan 4, 2025
222bb35
Fix: minicpmv2 (#1705)
Samoed Jan 6, 2025
25f4f61
ci: Refresh the v2 leaderboard daily (#1711)
orionw Jan 6, 2025
ab8805c
Fix: typos in adding a model (#1722)
Muennighoff Jan 8, 2025
9bcb52f
fix: rollback BUCC revision (#1706)
Samoed Jan 8, 2025
4dea042
1.26.5
invalid-email-address Jan 8, 2025
8702815
fix: Added zero shot tag to benchmark (#1710)
x-tabdeveloping Jan 8, 2025
18cefab
1.26.6
invalid-email-address Jan 8, 2025
7e16fa2
feat: reduce logging for load_results()
KennethEnevoldsen Jan 8, 2025
2ae00a2
1.27.0
invalid-email-address Jan 8, 2025
95f143a
feat: Add nomic modern bert (#1684)
Samoed Jan 9, 2025
f5962c6
fix: allow kwargs in init for RerankingWrapper (#1676)
Samoed Jan 9, 2025
3c68ea6
1.28.0
invalid-email-address Jan 9, 2025
752d2b8
Fixed result loading on leaderboard (#1739)
x-tabdeveloping Jan 9, 2025
8d033f3
test: Add script to test model loading below n_parameters threshold (…
isaac-chung Jan 9, 2025
9eff8ca
fix: Leaderboard Speedup (#1745)
x-tabdeveloping Jan 10, 2025
348b93d
1.28.1
invalid-email-address Jan 10, 2025
76bb070
fix: Fixed task_type aggregation on leaderboard (#1746)
x-tabdeveloping Jan 10, 2025
a4975fe
1.28.2
invalid-email-address Jan 10, 2025
407e205
fix: Fixed definition of zero-shot in ModelMeta (#1747)
x-tabdeveloping Jan 10, 2025
edd9d7f
1.28.3
invalid-email-address Jan 10, 2025
3fe9264
fix: fixes implementation of similarity() (#1748)
sam-hey Jan 10, 2025
75d78c1
1.28.4
invalid-email-address Jan 10, 2025
972463e
fix: Leaderboard: `K` instead of `M` (#1761)
KennethEnevoldsen Jan 11, 2025
8bc80aa
other: add script for leaderboard compare (#1758)
Samoed Jan 11, 2025
cc27c78
1.28.5
invalid-email-address Jan 11, 2025
3f093c8
fix: added annotations for training data (#1742)
KennethEnevoldsen Jan 11, 2025
c3b46b7
1.28.6
invalid-email-address Jan 11, 2025
0c5c3a5
fix: update max tokens for OpenAI (#1772)
Samoed Jan 12, 2025
71dbd61
ci: skip AfriSentiLID for now (#1785)
isaac-chung Jan 13, 2025
bad27a6
1.28.7
invalid-email-address Jan 13, 2025
9b117a8
ci: fix model loading test (#1775)
isaac-chung Jan 13, 2025
4a70e5d
feat: Update task filtering, fixing bug which included cross-lingual …
KennethEnevoldsen Jan 13, 2025
15a6812
1.29.0
invalid-email-address Jan 13, 2025
3ba7e22
fix: Added C-MTEB (#1786)
x-tabdeveloping Jan 13, 2025
48370c7
1.29.1
invalid-email-address Jan 13, 2025
e9e9118
docs: Add contact to MMTEB benchmarks (#1796)
isaac-chung Jan 14, 2025
94103e6
fix: loading pre 11 (#1798)
Samoed Jan 14, 2025
b6fb5b8
1.29.2
invalid-email-address Jan 14, 2025
a202884
fix: allow to load no revision available (#1801)
Samoed Jan 14, 2025
bcb2cd9
1.29.3
invalid-email-address Jan 14, 2025
0acc166
fix: Zero shot and aggregation on Leaderboard (#1810)
x-tabdeveloping Jan 15, 2025
3f5ee82
fix: Added `ModelMeta` for BGE, GTE Chinese and multilingual models (…
x-tabdeveloping Jan 15, 2025
217dabe
1.29.4
invalid-email-address Jan 15, 2025
c4ee9fe
fix: Add additional contacts (#1817)
KennethEnevoldsen Jan 15, 2025
e3a3df8
Update points table
github-actions[bot] Jan 15, 2025
186cc23
1.29.5
invalid-email-address Jan 15, 2025
748955c
fix: Added more Chinese models' `ModelMeta` (#1814)
x-tabdeveloping Jan 15, 2025
950f050
1.29.6
invalid-email-address Jan 15, 2025
60c4980
Add model inf-retriever-v1 (#1744)
SamuelYang1 Jan 15, 2025
d7a7791
ci: only return 1 model_name per file (#1818)
isaac-chung Jan 16, 2025
4ac59bc
fix: add bge-m3 `ModelMeta` (#1821)
Samoed Jan 16, 2025
9733d85
1.29.7
invalid-email-address Jan 16, 2025
74b495c
fix: Added Chinese Stella models (#1824)
x-tabdeveloping Jan 17, 2025
96420a2
fix: bm25s (#1827)
sam-hey Jan 17, 2025
3b2d074
fix: Added way more training dataset annotations (#1765)
KennethEnevoldsen Jan 17, 2025
9823529
fix: Added Misc Chinese models (#1819)
x-tabdeveloping Jan 17, 2025
b4d0eaa
1.29.8
invalid-email-address Jan 17, 2025
96f639b
fix: Fixed eval split for MultilingualSentiment in C-MTEB (#1804)
x-tabdeveloping Jan 17, 2025
762f729
1.29.9
invalid-email-address Jan 17, 2025
8be6b2e
fix: subsets to run (#1830)
Samoed Jan 20, 2025
0a83e38
fix: Remove default params, `public_training_data` and `memory usage`…
Samoed Jan 20, 2025
46f6abc
1.29.10
invalid-email-address Jan 20, 2025
a7a8144
fix: Add reported annotation and re-added public_training_data (#1846)
KennethEnevoldsen Jan 21, 2025
2fac8ba
1.29.11
invalid-email-address Jan 21, 2025
a8cc887
fix: Leaderboard Refinements (#1849)
x-tabdeveloping Jan 21, 2025
afd3c77
1.29.12
invalid-email-address Jan 21, 2025
889d6df
Merge branch 'main' into mieb_with_main
isaac-chung Jan 21, 2025
c72db2c
rest of the merge conflicts
isaac-chung Jan 21, 2025
7b067bc
fix merge conflicts
isaac-chung Jan 21, 2025
1a376e1
fill in model meta defaults
isaac-chung Jan 21, 2025
13aafd8
fix ModeMeta modalities
isaac-chung Jan 21, 2025
b3d3702
fix metadata pydantic errors;
isaac-chung Jan 21, 2025
b10c062
assert model.model instead since it is a wrapper
isaac-chung Jan 21, 2025
fe33061
fix: Fixed leaderboard search bar (#1852)
x-tabdeveloping Jan 22, 2025
2f8cfae
1.29.13
invalid-email-address Jan 22, 2025
4bd7328
fix: Hotfixed public_training_data type annotation (#1857)
x-tabdeveloping Jan 22, 2025
4985da9
fix: Fix zeta alpha mistral (#1736)
Samoed Jan 22, 2025
12ed9c5
Add more annotations (#1833)
Samoed Jan 22, 2025
fde446d
1.29.14
invalid-email-address Jan 22, 2025
692bd26
fix: Adding missing model meta (#1856)
x-tabdeveloping Jan 22, 2025
6ac4798
fix Encoder class
isaac-chung Jan 22, 2025
14e18e8
Merge branch 'main' into mieb_with_main
isaac-chung Jan 22, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
16 changes: 16 additions & 0 deletions .github/workflows/leaderboard_refresh.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
name: Daily Space Rebuild
on:
schedule:
# Runs at midnight Pacific Time (8 AM UTC)
- cron: '0 8 * * *'
workflow_dispatch: # Allows manual triggering

jobs:
rebuild:
runs-on: ubuntu-latest
steps:
- name: Trigger Factory Rebuild
run: |
curl -X POST \
"https://huggingface.co/api/spaces/mteb/leaderboard_2_demo/restart?factory=true" \
-H "Authorization: Bearer ${{ secrets.HF_TOKEN }}"
24 changes: 24 additions & 0 deletions .github/workflows/model_loading.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
name: Model Loading

on:
pull_request:
paths:
- 'mteb/models/**.py'

jobs:
extract-and-run:
runs-on: ubuntu-latest

steps:
- name: Checkout repository
uses: actions/checkout@v3

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.10'
cache: 'pip'

- name: Install dependencies and run tests
run: |
make model-load-test BASE_BRANCH=${{ github.event.pull_request.base.ref }}
6 changes: 5 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -143,4 +143,8 @@ sb.ipynb
tests/create_meta/model_card.md

# removed results from mteb repo they are now available at: https://github.com/embeddings-benchmark/results
results/
results/
uv.lock

# model loading tests
model_names.txt
9 changes: 8 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -35,4 +35,11 @@ pr:
build-docs:
@echo "--- 📚 Building documentation ---"
# since we do not have a documentation site, this just build tables for the .md files
python docs/create_tasks_table.py
python docs/create_tasks_table.py


model-load-test:
@echo "--- 🚀 Running model load test ---"
pip install ".[dev, speedtask, pylate,gritlm,xformers,model2vec]"
python scripts/extract_model_names.py $(BASE_BRANCH) --return_one_model_name_per_file
python tests/test_models/model_loading.py --model_name_file scripts/model_names.txt
78 changes: 72 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,17 +46,15 @@ from sentence_transformers import SentenceTransformer

# Define the sentence-transformers model name
model_name = "average_word_embeddings_komninos"
# or directly from huggingface:
# model_name = "sentence-transformers/all-MiniLM-L6-v2"

model = SentenceTransformer(model_name)
model = mteb.get_model(model_name) # if the model is not implemented in MTEB it will be eq. to SentenceTransformer(model_name)
tasks = mteb.get_tasks(tasks=["Banking77Classification"])
evaluation = mteb.MTEB(tasks=tasks)
results = evaluation.run(model, output_folder=f"results/{model_name}")
```

<details>
<summary> Running SentneceTransformermer model with prompts </summary>
<summary> Running SentenceTransformer model with prompts </summary>

Prompts can be passed to the SentenceTransformer model using the `prompts` parameter. The following code shows how to use prompts with SentenceTransformer:

Expand Down Expand Up @@ -164,7 +162,7 @@ For instance to select the 56 English datasets that form the "Overall MTEB Engli

```python
import mteb
benchmark = mteb.get_benchmark("MTEB(eng)")
benchmark = mteb.get_benchmark("MTEB(eng, classic)")
evaluation = mteb.MTEB(tasks=benchmark)
```

Expand Down Expand Up @@ -211,6 +209,21 @@ Note that the public leaderboard uses the test splits for all datasets except MS

</details>


<details>
<summary> Selecting evaluation subset </summary>

### Selecting evaluation subset
You can evaluate only on selected subsets. For example, if you want to evaluate only the `subset_name_to_run` subset of all tasks, do the following:

```python
evaluation.run(model, eval_subsets=["subset_name_to_run"])
```

Monolingual tasks have `default` subset, other tasks have subsets that are specific to the dataset.

</details>

<details>
<summary> Using a custom model </summary>

Expand All @@ -220,7 +233,10 @@ Note that the public leaderboard uses the test splits for all datasets except MS
Models should implement the following interface, implementing an `encode` function taking as inputs a list of sentences, and returning a list of embeddings (embeddings can be `np.array`, `torch.tensor`, etc.). For inspiration, you can look at the [mteb/mtebscripts repo](https://github.com/embeddings-benchmark/mtebscripts) used for running diverse models via SLURM scripts for the paper.

```python
import mteb
from mteb.encoder_interface import PromptType
import numpy as np


class CustomModel:
def encode(
Expand All @@ -244,7 +260,7 @@ class CustomModel:
pass

model = CustomModel()
tasks = mteb.get_task("Banking77Classification")
tasks = mteb.get_tasks(tasks=["Banking77Classification"])
evaluation = MTEB(tasks=tasks)
evaluation.run(model)
```
Expand Down Expand Up @@ -313,6 +329,34 @@ evaluation.run(
)
```

</details>

<details>
<summary> Late Interaction (ColBERT) </summary>

### Using Late Interaction models for retrieval

```python
from mteb import MTEB
import mteb


colbert = mteb.get_model("colbert-ir/colbertv2.0")
tasks = mteb.get_tasks(tasks=["NFCorpus"], languages=["eng"])

eval_splits = ["test"]

evaluation = MTEB(tasks=tasks)

evaluation.run(
colbert,
eval_splits=eval_splits,
corpus_chunk_size=500,
)
```
This implementation employs the MaxSim operation to compute the similarity between sentences. While MaxSim provides high-quality results, it processes a larger number of embeddings, potentially leading to increased resource usage. To manage resource consumption, consider lowering the `corpus_chunk_size` parameter.


</details>

<details>
Expand Down Expand Up @@ -378,6 +422,28 @@ results = mteb.load_results(models=models, tasks=tasks)
df = results_to_dataframe(results)
```

</details>


<details>
<summary> Annotate Contamination in the training data of a model </summary>

### Annotate Contamination

have your found contamination in the training data of a model? Please let us know, either by opening an issue or ideally by submitting a PR
annotatig the training datasets of the model:

```py
model_w_contamination = ModelMeta(
name = "model-with-contamination"
...
training_datasets: {"ArguAna": # name of dataset within MTEB
["test"]} # the splits that have been trained on
...
)
```


</details>

<details>
Expand Down
7 changes: 3 additions & 4 deletions docs/adding_a_dataset.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,15 +37,14 @@ class SciDocsReranking(AbsTaskReranking):
dataset={
"path": "mteb/scidocs-reranking",
"revision": "d3c5e1fc0b855ab6097bf1cda04dd73947d7caab",
}
},
date=("2000-01-01", "2020-12-31"), # best guess
domains=["Academic", "Non-fiction", "Domains"],
task_subtypes=["Scientific Reranking"],
license="cc-by-4.0",
annotations_creators="derived",
dialect=[],
sample_creation="found",
descriptive_stats={"n_samples": {"test": 19599}, "avg_character_length": {"test": 69.0}},
bibtex_citation="""
@inproceedings{cohan-etal-2020-specter,
title = "{SPECTER}: Document-level Representation Learning using Citation-informed Transformers",
Expand Down Expand Up @@ -73,7 +72,7 @@ class SciDocsReranking(AbsTaskReranking):

# testing the task with a model:
model = SentenceTransformer("average_word_embeddings_komninos")
evaluation = MTEB(tasks=[MindSmallReranking()])
evaluation = MTEB(tasks=[SciDocsReranking()])
evaluation.run(model)
```

Expand Down Expand Up @@ -109,7 +108,7 @@ class VGClustering(AbsTaskClustering):
dialect=[],
text_creation="found",
bibtex_citation= ... # removed for brevity
)
)

def dataset_transform(self):
splits = self.description["eval_splits"]
Expand Down
31 changes: 27 additions & 4 deletions docs/adding_a_model.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,7 @@ model = mteb.get_model("sentence-transformers/paraphrase-multilingual-MiniLM-L12

tasks = mteb.get_tasks(...) # get specific tasks
# or
from mteb.benchmarks import MTEB_MAIN_EN
tasks = MTEB_MAIN_EN # or use a specific benchmark
tasks = mteb.get_benchmark("MTEB(eng, classic)") # or use a specific benchmark

evaluation = mteb.MTEB(tasks=tasks)
evaluation.run(model, output_folder="results")
Expand All @@ -29,29 +28,49 @@ mteb run -m {model_name} -t {task_names}

These will save the results in a folder called `results/{model_name}/{model_revision}`.

<<<<<<< HEAD
1. **Format the results using the CLI:**
=======
2. **Push Results to the Leaderboard**

To add results to the public leaderboard you can push your results to the [results repository](https://github.com/embeddings-benchmark/results) via a PR. Once merged they will appear on the leaderboard after a day.


3. (Optional) **Add results to the model card:**

`mteb` implements a cli for adding results to the model card:
>>>>>>> main

```bash
mteb create_meta --results_folder results/{model_name}/{model_revision} --output_path model_card.md
```

If readme of model exists:
To add the content to the public model simply copy the content of the `model_card.md` file to the top of a `README.md` file of your model on the Hub. See [here](https://huggingface.co/Muennighoff/SGPT-5.8B-weightedmean-msmarco-specb-bitfit/blob/main/README.md) for an example.

If the readme already exists:

```bash
mteb create_meta --results_folder results/{model_name}/{model_revision} --output_path model_card.md --from_existing your_existing_readme.md
```

<<<<<<< HEAD
2. **Add the frontmatter to model repository:**

Copy the content of the `model_card.md` file to the top of a `README.md` file of your model on the Hub. See [here](https://huggingface.co/Muennighoff/SGPT-5.8B-weightedmean-msmarco-specb-bitfit/blob/main/README.md) for an example.
=======
Note that running the model on many tasks may lead to a huge readme front matter.
>>>>>>> main

3. **Wait for a refresh the leaderboard:**

The leaderboard [automatically refreshes daily](https://github.com/embeddings-benchmark/leaderboard/commits/main/) so once submitted you only need to wait for the automatic refresh. You can find the workflows for the leaderboard refresh [here](https://github.com/embeddings-benchmark/leaderboard/tree/main/.github/workflows). If you experience issues with the leaderboard please create an [issue](https://github.com/embeddings-benchmark/mteb/issues).

**Notes:**
- We remove models with scores that cannot be reproduced, so please ensure that your model is accessible and scores can be reproduced.
<<<<<<< HEAD
- An alternative way of submitting to the leaderboard is by opening a PR with your results [here](https://github.com/embeddings-benchmark/results) & checking that they are displayed correctly by [locally running the leaderboard](https://github.com/embeddings-benchmark/leaderboard?tab=readme-ov-file#developer-setup)
=======
>>>>>>> main

- ##### Using Prompts with Sentence Transformers

Expand All @@ -65,4 +84,8 @@ The leaderboard [automatically refreshes daily](https://github.com/embeddings-be

###### Instantiating the Model with Prompts

If you are unable to directly add the prompts in the model configuration, you can instantiate the model using the `sentence_transformers_loader` and pass `prompts` as an argument. For more details, see the `mteb/models/bge_models.py` file.
<<<<<<< HEAD
If you are unable to directly add the prompts in the model configuration, you can instantiate the model using the `sentence_transformers_loader` and pass `prompts` as an argument. For more details, see the `mteb/models/bge_models.py` file.
=======
If you are unable to directly add the prompts in the model configuration, you can instantiate the model using the `sentence_transformers_loader` and pass `prompts` as an argument. For more details, see the `mteb/models/bge_models.py` file.
>>>>>>> main
Loading
Loading