Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
217 commits
Select commit Hold shift + click to select a range
17be7e5
model: add image support for jina embeddings v4 (#2893)
makram93 Jul 11, 2025
9ecac21
model: add kalm_models (kalm-emb-v2) ModelMeta (new PR) (#2889)
ItsukiFujii Jul 15, 2025
4a47f90
Add Classification Evaluator unit test (#2838)
fzowl Jul 15, 2025
9864e2a
fix: update colpali engine models (#2905)
paultltc Jul 16, 2025
5a8ccec
1.38.35
invalid-email-address Jul 16, 2025
c7078af
Evaluator tests (#2910)
fzowl Jul 19, 2025
aef1e33
Classification dataset cleaning (#2900)
AlexeyVatolin Jul 19, 2025
56c98ed
Update tasks & benchmarks tables
github-actions[bot] Jul 19, 2025
57438c2
dataset: Add JapaneseSentimentClassification (#2913)
lsz05 Jul 19, 2025
372fc4c
Update tasks & benchmarks tables
github-actions[bot] Jul 19, 2025
a298fa9
fix: change `passage` prompt to `document` (#2912)
Samoed Jul 20, 2025
8eb4f6d
1.38.36
invalid-email-address Jul 20, 2025
5a868e3
model: Add OpenSearch inf-free sparse encoding models (#2903)
zhichao-aws Jul 20, 2025
1dcc6dc
dataset: add BarExamQA dataset (#2916)
abdurrahmanbutler Jul 21, 2025
c1922c8
Use `mteb.get_model` in adding_a_dataset.md (#2922)
Samoed Jul 21, 2025
0ac0231
fix: specify revision for opensearch (#2919)
Samoed Jul 21, 2025
b12b926
1.38.37
invalid-email-address Jul 21, 2025
533ce59
Update the link for gemini-embedding-001 (#2928)
Feiyang1 Jul 22, 2025
5ed6c90
fix: replace with passage (#2934)
makram93 Jul 22, 2025
79a43af
fix: Only import SparseEncoder once sentence-transformer version have…
KennethEnevoldsen Jul 22, 2025
8496ec2
fix: Prevent incorrectly passing "selector_state" to `get_benchmark` …
KennethEnevoldsen Jul 22, 2025
a78debf
docs: Update adding_a_dataset.md (#2947)
KennethEnevoldsen Jul 25, 2025
4ef8571
ci: bump semantic release
KennethEnevoldsen Jul 25, 2025
03a0582
1.38.38
Jul 25, 2025
8416541
dataset: Add BSARD v2, fixing the data loading issues of v1 (#2935)
nikolay-banar Jul 25, 2025
da46c8e
Update tasks & benchmarks tables
github-actions[bot] Jul 25, 2025
42dfe0d
dataset: add GovReport dataset (#2953)
abdurrahmanbutler Jul 29, 2025
007d19f
dataset: add BillSum datasets (#2943)
abdurrahmanbutler Jul 30, 2025
e4f30e9
Update tasks & benchmarks tables
github-actions[bot] Jul 30, 2025
36df9ca
fix: Add new benchmark beRuSciBench along with AbsTaskTextRegression …
AlexeyVatolin Aug 2, 2025
a86e2dd
Update tasks & benchmarks tables
github-actions[bot] Aug 2, 2025
4a567d2
1.38.39
Aug 3, 2025
6c1f1c6
qzhou-embedding model_meta & implementation (#2975)
PennyYu123 Aug 7, 2025
e5d386b
model: Add Voyage 3.5 model configuration (#3005)
fzowl Aug 9, 2025
042db73
model: BAAI/bge-m3-unsupervised Model (#3007)
fzoll Aug 9, 2025
01840ce
lint: Correcting lint errors (#3004)
fzowl Aug 9, 2025
741b022
dataset: Added 50 Vietnamese dataset from vn-mteb (#2964)
BaoLocPham Aug 9, 2025
4adf565
Update tasks & benchmarks tables
github-actions[bot] Aug 9, 2025
87eb27c
model: Add Cohere embed-v4.0 model support (#3006)
fzowl Aug 10, 2025
d8b2910
Add OpenAI models with 512 dimension (#3008)
fzoll Aug 11, 2025
ea41e7a
Standardise task names and fix citation formatting (#3026)
abdurrahmanbutler Aug 13, 2025
177997f
Update tasks & benchmarks tables
github-actions[bot] Aug 13, 2025
20bc80c
fix: Add missing training sets for qzhou (#3023)
PennyYu123 Aug 16, 2025
f3f11cc
1.38.40
Aug 16, 2025
96a7cc5
model: Add samilpwc_models meta (#3028)
ElPlaguister Aug 16, 2025
37d115a
model: Add granite-vision-embedding model (#3029)
roipony Aug 16, 2025
5c65913
fix: incorrect revision for SNLRetrieval (#3033)
KennethEnevoldsen Aug 17, 2025
d4e6223
dataset: Add HumanEvalRetrieval task (#3022)
fzoll Aug 17, 2025
a96f2e4
Update tasks & benchmarks tables
github-actions[bot] Aug 17, 2025
3398742
1.38.41
Aug 17, 2025
4aaf47e
ci: reduce parallel runs for when checking if a dataset exists (#3035)
KennethEnevoldsen Aug 17, 2025
e124b56
ci: Updating rerun delays to prevent false positives errors
KennethEnevoldsen Aug 17, 2025
d729d32
Merge branch 'main' of https://github.com/embeddings-benchmark/mteb
KennethEnevoldsen Aug 17, 2025
e476dc3
ci: Updating rerun delays to prevent false positives errors
KennethEnevoldsen Aug 17, 2025
72f7b05
model: Add GreenNode Vietnamese Embedding models (#2994)
BaoLocPham Aug 18, 2025
e08ec56
model: add granite-embedding-english R2 models (#3050)
aashka-trivedi Aug 18, 2025
c58b319
fix: Updated revision for jina-embeddings-v4 (#3046)
jupyterjazz Aug 18, 2025
46f4261
1.38.42
Aug 18, 2025
4e3fcd8
Fix 3 VN-MTEB Pair Classification tasks (#3053)
BaoLocPham Aug 19, 2025
ac69263
dataset: Add mbpp retrieval (#3037)
fzoll Aug 20, 2025
1fff5ce
Update tasks & benchmarks tables
github-actions[bot] Aug 20, 2025
7b289f5
dataset: Added wikisql retrieval (#3039)
fzoll Aug 20, 2025
7da3cf9
Update tasks & benchmarks tables
github-actions[bot] Aug 20, 2025
6fa6efa
ci: Temporarily limit pytrec version to "pytrec-eval-terrier>=0.5.6, …
Samoed Aug 20, 2025
ea801ec
fix MBPPRetrieval revision (#3055)
isaac-chung Aug 20, 2025
0a6e855
fix: Add VN-MTEB benchmark and Leaderboard (#2995)
BaoLocPham Aug 20, 2025
def1377
Update tasks & benchmarks tables
github-actions[bot] Aug 20, 2025
ef9771c
1.38.43
Aug 20, 2025
53d7d84
Add hc3finance retrieval (#3041)
fzoll Aug 20, 2025
7b57185
Add finqa retrieval (#3042)
fzoll Aug 20, 2025
fd8f89e
Update tasks & benchmarks tables
github-actions[bot] Aug 20, 2025
4da11c6
Add FinanceBenchRetrieval task (#3044)
fzoll Aug 20, 2025
fe57390
Update tasks & benchmarks tables
github-actions[bot] Aug 20, 2025
a291a05
Add FreshStackRetrieval task (#3043)
fzoll Aug 21, 2025
e1ede42
Update tasks & benchmarks tables
github-actions[bot] Aug 21, 2025
53f0986
dataset: Add ds1000 retrieval (#3038)
fzoll Aug 21, 2025
d2fcbac
Update tasks & benchmarks tables
github-actions[bot] Aug 21, 2025
e91cb8e
Add ChatDoctorRetrieval (#3045)
fzoll Aug 21, 2025
69099fe
Update tasks & benchmarks tables
github-actions[bot] Aug 21, 2025
8e1c354
Correcting the (new) DS1000 dataset's revision (#3063)
fzoll Aug 21, 2025
cf3e1bb
dataset: Add JinaVDR (#2942)
maximilianwerk Aug 22, 2025
26468f8
Update tasks & benchmarks tables
github-actions[bot] Aug 22, 2025
4994ea1
model: Add CoDi-Embedding-V1 (#3054)
ZBWpro Aug 22, 2025
9c27f71
fix: ensure that there are always relevant docs attached to query (#3…
KennethEnevoldsen Aug 22, 2025
616a517
1.38.44
Aug 22, 2025
70724e7
Correcting the JINA models with SentenceTransformerWrapper (#3071)
fzoll Aug 24, 2025
df719cc
ci: Add stale workflow (#3066)
isaac-chung Aug 25, 2025
1f9641a
fix: open_clip package validation (#3073)
FacerAin Aug 25, 2025
f210ac1
1.38.45
Aug 25, 2025
63a0c60
fix: Update revision for qzhou models (#3069)
PennyYu123 Aug 25, 2025
3153707
1.38.46
Aug 25, 2025
d2c3570
Fix the reference link for CoDi-Embedding-V1 (#3075)
ZBWpro Aug 25, 2025
1541318
fix: Add beta version of RTEB related benchmarks (#3048)
fzoll Aug 27, 2025
bce7471
1.38.47
Aug 27, 2025
b46b633
fix: run `ruff check` on all files during ci (#3086)
Samoed Aug 27, 2025
6db355e
1.38.48
Aug 27, 2025
cd14ef6
Move dev to dependency groups (#3088)
Samoed Aug 28, 2025
139fc73
fix: Improving validate_task_to_prompt_name logs and error messages (…
RyanMullins Aug 28, 2025
27be671
fix: duplicate mteb multilingual variables (#3080)
Samoed Aug 28, 2025
5bf303b
Update tasks & benchmarks tables
github-actions[bot] Aug 28, 2025
e4c2a95
model: mdbr-leaf models (#3081)
robin-vjc Aug 28, 2025
2b7089a
1.38.49
Aug 28, 2025
17fa697
CI: Set upper limit for xdist version (#3098)
Samoed Aug 29, 2025
9586697
Combine Plots and Tables into a Single (#3047)
q275343119 Aug 29, 2025
5851c7a
fix: Updating the default batch size calculation in the voyage models…
fzoll Sep 1, 2025
80966c2
1.38.50
Sep 1, 2025
4012517
fix: Add @classmethod for @field_validators in TaskMetadata (#3100)
Samoed Sep 1, 2025
7303c15
Align task prompt dict with `PromptType` (#3101)
Samoed Sep 1, 2025
b7b5d11
1.38.51
Sep 1, 2025
4774b74
model: Add ModelMeta for OrdalieTech/Solon-embeddings-mini-beta-1.1 (…
mathlesage Sep 1, 2025
5844cc7
fix: Allow closed datasets (#3059)
fzoll Sep 1, 2025
07bf861
1.38.52
Sep 1, 2025
73a35e0
Ci: test out GH models with welcoming new comers (#3112)
isaac-chung Sep 1, 2025
6e8eba1
ci: Dataset check on new PR (#3103)
isaac-chung Sep 2, 2025
652ff2b
model: add Youtu-Embedding-V1 (#3115)
spring-quan Sep 3, 2025
9c7804c
fix: add voyage quantization models (#3092)
fzoll Sep 3, 2025
647c8c3
1.38.53
Sep 3, 2025
729f20a
model: EmbeddingGemma 300M (#3129)
RyanMullins Sep 4, 2025
53f49ec
fix: Add dedicated display for RTEB benchmark results (#3089)
q275343119 Sep 8, 2025
32c9746
Update tasks & benchmarks tables
github-actions[bot] Sep 8, 2025
4e5f597
1.38.54
Sep 8, 2025
8f8ed49
dataset: Add Dapfam patent retrieval tasks (#2946)
iliass-y Sep 9, 2025
b622870
Update tasks & benchmarks tables
github-actions[bot] Sep 9, 2025
10c4948
Align max tokens (#3172)
Muennighoff Sep 12, 2025
ed68a89
Correct the VoyageAI model's batch creation/batch size calculation (#…
fzoll Sep 16, 2025
e7141d9
dataset: Adding JapaneseCode1Retrieval as the first non-public datase…
fzoll Sep 16, 2025
2093798
fix: add version check for `embeddinggemma-300m` (#3189)
Samoed Sep 18, 2025
bc303ad
dataset: Added a set of closed datasets (#3186)
fzoll Sep 18, 2025
d682c85
Update tasks & benchmarks tables
github-actions[bot] Sep 18, 2025
57ffd43
fix: Edit ack & sponsors (#3187)
Muennighoff Sep 18, 2025
5f4ea31
dataset: Update FaMTEB to Version 2 (#3157)
mehran-sarmadi Sep 18, 2025
7266873
Update tasks & benchmarks tables
github-actions[bot] Sep 18, 2025
6811486
1.38.55
Sep 18, 2025
0cc6802
fix: Add conflicting dependencies to toml (#3191)
Samoed Sep 18, 2025
3306aeb
1.38.56
Sep 18, 2025
90e9f43
fix: Correct metadata for ArguAna dataset (#3202)
whybe-choi Sep 21, 2025
920dafe
Update tasks & benchmarks tables
github-actions[bot] Sep 21, 2025
cd37c7a
1.38.57
Sep 21, 2025
6718290
model: Add BMRetriever (#3195)
whybe-choi Sep 22, 2025
6e72dc0
Revert "Ci: test out GH models with welcoming new comers" (#3206)
isaac-chung Sep 22, 2025
4f6d791
model: Add Codefuse models (#3205)
Geralt-Targaryen Sep 24, 2025
82d9e29
fix(models): ensure prompt_type is passed to format_instruction (#3216)
whybe-choi Sep 26, 2025
d0d427d
1.38.58
Sep 27, 2025
08bba49
Adding Cohere's output_dimension and embedding_type parameter (#3204)
fzoll Sep 27, 2025
e863bc1
dataset: add swedish cpc patent classifications to mteb (#3072)
Atheer2104 Sep 27, 2025
8c180d4
fix: AttributeError in ColPaliEngineWrapper similarity method (#3177)
FacerAin Sep 27, 2025
0aacba4
Update tasks & benchmarks tables
github-actions[bot] Sep 27, 2025
2e292cf
1.38.59
Sep 27, 2025
f58ac2b
fix: prevent EOS token truncation (#3218)
whybe-choi Sep 27, 2025
3e86531
1.38.60
Sep 27, 2025
15f9909
Update giga embeddings (#3210)
ekolodin Sep 29, 2025
cb03bd4
fix: Refactor split create_tables into static Benchmark methods (#3126)
q275343119 Sep 29, 2025
a52723a
1.38.61
Sep 29, 2025
4f58684
Extending the RTEB benchmark (#3223)
fzoll Sep 29, 2025
7f5990a
Update tasks & benchmarks tables
github-actions[bot] Sep 29, 2025
e299345
model: New qzmodel (#3211)
PennyYu123 Sep 30, 2025
0000ae2
model: Update Youtu embedding model (#3227)
spring-quan Sep 30, 2025
e56e7c4
dataset: Add Software Issue Localization Datasets (#3178)
tarsur909 Sep 30, 2025
65f29e6
Update tasks & benchmarks tables
github-actions[bot] Sep 30, 2025
11f9c1d
feat: Officially include RTEB in the leaderboard (#3222)
KennethEnevoldsen Oct 1, 2025
867105f
Update tasks & benchmarks tables
github-actions[bot] Oct 1, 2025
cf26684
1.39.0
Oct 1, 2025
600c290
fix: Add submission references for RTEB (#3233)
KennethEnevoldsen Oct 1, 2025
12fe80b
1.39.1
Oct 1, 2025
48a01fc
dataset: add human tasks and benchmark (#3214)
Samoed Oct 2, 2025
9a606a0
Update tasks & benchmarks tables
github-actions[bot] Oct 2, 2025
e419b54
Remove 'HUME(v1)' from leaderboard benchmark (#3236)
Samoed Oct 2, 2025
50aa4ac
docs: Update adding benchmark documentation (#3229)
Samoed Oct 2, 2025
a2f7488
fix: Further specified macro-language code for Norwegian (#3228)
KennethEnevoldsen Oct 2, 2025
810ae28
Update tasks & benchmarks tables
github-actions[bot] Oct 2, 2025
9249630
1.39.2
Oct 2, 2025
2f6eb2a
fix max tokens (#3243)
Muennighoff Oct 2, 2025
85e1dd9
fix python39 transformers compatibility (#3254)
Samoed Oct 5, 2025
36901eb
Aggregate by subset for HUMEv1 (#3255)
isaac-chung Oct 5, 2025
89bec7d
Update tasks & benchmarks tables
github-actions[bot] Oct 5, 2025
08b98cd
Fix AbsTaskTextRegression task (#3257)
AlexeyVatolin Oct 5, 2025
53b1c29
Added Japanese to Retrieval (#3252)
q275343119 Oct 5, 2025
c8ae52c
Update tasks & benchmarks tables
github-actions[bot] Oct 5, 2025
237d8dc
fix bm25 on small datasets (#3261)
Samoed Oct 6, 2025
65829bd
fix: Move zero-shot percentage calculation to the end of summary (#3231)
q275343119 Oct 6, 2025
f2504bd
model: Add ReasonIR (#3221)
whybe-choi Oct 6, 2025
58a81a9
fix: Only pin model name and rank (#3263)
KennethEnevoldsen Oct 6, 2025
bc953bf
1.39.3
Oct 6, 2025
1e29385
fix: resolve flash-attention dependency issue (#3265)
KennethEnevoldsen Oct 6, 2025
0f61c9f
1.39.4
Oct 6, 2025
e81c94f
fix: Add retry and token counting in Cohere models (#3253)
fzoll Oct 7, 2025
479c2a0
1.39.5
Oct 7, 2025
30de619
Align MIEB leaderboards with paper (#3272)
isaac-chung Oct 7, 2025
9b6f320
fix: add prompt for MIRACLRetrievalHardNegatives (#3266)
Drozhzhinastya Oct 7, 2025
5a5bcfd
Update tasks & benchmarks tables
github-actions[bot] Oct 7, 2025
e176ba6
Add Regression task mock (#3271)
AlexeyVatolin Oct 7, 2025
4936fe2
1.39.6
Oct 7, 2025
0a902a3
fix: Change language for task SlovakMovieReviewSentimentClassificatio…
andrejridzik Oct 8, 2025
94aa0d5
Update tasks & benchmarks tables
github-actions[bot] Oct 8, 2025
d2c704c
1.39.7
Oct 8, 2025
67f7ad9
Add english code retriever model (#3302)
fedor28 Oct 10, 2025
721e8a3
docs: fix typos in `docs/adding_a_benchmark.md` (#3344)
whybe-choi Oct 13, 2025
c07c289
BREAKING: v2.0.0 (#1433)
KennethEnevoldsen Oct 20, 2025
a329381
docs: Update AbsTaskPairClassification to correct path (#3437)
KennethEnevoldsen Oct 20, 2025
ba2a434
breaking: Updating to v2
KennethEnevoldsen Oct 20, 2025
29e605f
test: disable flaky test (#3442)
KennethEnevoldsen Oct 20, 2025
c20226f
feat!: Updating to v2
KennethEnevoldsen Oct 20, 2025
51412db
2.0.0
Oct 20, 2025
838f25f
bump version
KennethEnevoldsen Oct 20, 2025
48b11a7
Merge branch 'main' of https://github.com/embeddings-benchmark/mteb
KennethEnevoldsen Oct 20, 2025
493ce4f
Merge branch 'main' into maeb_merge_main_v2
Samoed Oct 20, 2025
b457921
linter pass
Samoed Oct 20, 2025
6b8f98d
make mteb importable
Samoed Oct 20, 2025
3f52798
add audio to test dependencies
Samoed Oct 20, 2025
cf871d3
remove metadata_dict
Samoed Oct 21, 2025
ac9c19e
fix tests
Samoed Oct 21, 2025
81d0c96
fix tests
Samoed Oct 21, 2025
5fac1f5
make torchaudio optional
Samoed Oct 21, 2025
632673c
fix retrieval init
Samoed Oct 21, 2025
0b23206
fix imports
Samoed Oct 21, 2025
050107d
fix retrieval task and torch audio
Samoed Oct 21, 2025
e060767
move more torcaudio imports
Samoed Oct 21, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
1 change: 0 additions & 1 deletion .github/ISSUE_TEMPLATE/enhancement.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,3 @@ body:
description: Please provide a clear and concise description of the feature you would like to see added.
validations:
required: true

33 changes: 33 additions & 0 deletions .github/ISSUE_TEMPLATE/eval_request.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
name: 📊 Evaluation Request
description: Create a request for a model to be evaluated in MTEB
title: "Evaluate model: {model_id}"
labels: ["evaluation request"]
body:
- type: input
attributes:
label: Model link on Hugging Face
description: Please provide a link to the model on Hugging Face. If the model is closed-source, please provide a link to the model provider or documentation.
validations:
required: true
- type: textarea
attributes:
label: What do you want it to be evaluated on?
description: Please specify the tasks or benchmarks you would like this model to be evaluated on.
validations:
required: True
- type: dropdown
id: contribute
attributes:
label: Are you interested in contributing to the evaluation of this model?
description: By default MTEB maintainters will only handle evaluation on private subsets due to resource constraints. If you are interested in contributing to the evaluation, please let us know.
options:
- "Yes"
- "No"
- type: dropdown
id: exists
attributes:
label: Does this model already exist in MTEB?
description: If you are unsure, please check using mteb model registry (e.g. using `mteb.get_model_meta("model_id")`).
options:
- "Yes"
- "No"
28 changes: 28 additions & 0 deletions .github/workflows/dataset_loading_pr.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
name: Datasets available on HuggingFace - PR

on:
pull_request:
paths:
- "mteb/tasks/**.py"

jobs:
run-pr-datasets-loading-check:
runs-on: ubuntu-latest

steps:
- name: Checkout repository
uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.11'
cache: 'pip'

- name: Install dependencies
run: |
make install-for-tests

- name: Run dataset loading tests
run: |
make dataset-load-test-pr BASE_BRANCH=${{ github.event.pull_request.base.ref }}
13 changes: 7 additions & 6 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
# GitHub action for the task table generation.

name: documentation
name: tables

on:
push:
Expand All @@ -24,11 +23,12 @@ jobs:

- name: Install dependencies
run: |
make install
python -m pip install --upgrade pip
pip install -e . --group docs

- name: Create table
run: |
make build-docs
make build-tables

create-table-and-push:
if: github.ref == 'refs/heads/main'
Expand All @@ -43,11 +43,12 @@ jobs:

- name: Install dependencies
run: |
make install
python -m pip install --upgrade pip
pip install -e . --group docs

- name: Create table
run: |
make build-docs
make build-tables

- name: Push table
env:
Expand Down
32 changes: 32 additions & 0 deletions .github/workflows/documentation.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
name: Documentation

on:
push:
branches: [main]
pull_request:


permissions:
contents: write

jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
with:
python-version: '3.10'

- name: Dependencies
run: |
python -m pip install --upgrade pip
pip install -e . --group docs

- name: Build and Deploy
if: github.event_name == 'push'
run: mkdocs gh-deploy --force

- name: Build
if: github.event_name == 'pull_request'
run: make build-docs
3 changes: 1 addition & 2 deletions .github/workflows/leaderboard_build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ on:
push:
branches: [main]
pull_request:
branches: [main]

jobs:
leaderboard:
Expand All @@ -26,4 +25,4 @@ jobs:

- name: Run leaderboard build test
run: |
make leaderboard-build-test
make leaderboard-build-test
6 changes: 3 additions & 3 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,11 @@ jobs:
fail-fast: false
matrix:
os: [ubuntu-latest] #, macos-latest, windows-latest]
python-version: ["3.9", "3.10", "3.11", "3.12"]
python-version: ["3.10", "3.11", "3.12", "3.13"]
include:
# Add Windows with Python 3.8 only to avoid tests taking too long
# Add Windows with Python 3.10 only to avoid tests taking too long
- os: windows-latest
python-version: "3.9"
python-version: "3.10"

steps:
- uses: actions/checkout@v3
Expand Down
8 changes: 7 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,6 @@ sb.ipynb
tests/create_meta/model_card.md

# removed results from mteb repo they are now available at: https://github.com/embeddings-benchmark/results
results/
uv.lock

# model loading tests
Expand All @@ -152,3 +151,10 @@ mteb/leaderboard/__cached_results.json

# gradio
.gradio/

# codecarbon
powermetrics_log.txt

# vscode
.vscode/launch.json

27 changes: 23 additions & 4 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ install:
install-for-tests:
@echo "--- 🚀 Installing project dependencies for test ---"
@echo "This ensures that the project is not installed in editable mode"
pip install ".[image]" --group dev
pip install ".[bm25s,pylate,image,audio,codecarbon,faiss-cpu]" --group dev

lint:
@echo "--- 🧹 Running linters ---"
Expand Down Expand Up @@ -34,12 +34,22 @@ pr:
make test


build-docs:
@echo "--- 📚 Building documentation ---"
# since we do not have a documentation site, this just build tables for the .md files
build-tables:
@echo "--- 📚 Building tables ---"
# This just build tables for the .md files
python docs/create_tasks_table.py
python docs/create_benchmarks_table.py

build-docs:
@echo "--- 📚 Building documentation ---"
python docs/overview/create_available_tasks.py
python docs/overview/create_available_models.py
python docs/overview/create_available_benchmarks.py

serve-docs:
@echo "--- 📚 Serving documentation ---"
python -m mkdocs serve


model-load-test:
@echo "--- 🚀 Running model load test ---"
Expand All @@ -52,6 +62,10 @@ dataset-load-test:
@echo "--- 🚀 Running dataset load test ---"
pytest -m test_datasets

dataset-load-test-pr:
@echo "--- 🚀 Running dataset load test for PR ---"
eval "$$(python -m scripts.extract_datasets $(BASE_BRANCH))" && pytest -m test_datasets

leaderboard-build-test:
@echo "--- 🚀 Running leaderboard build test ---"
pytest -n auto -m leaderboard_stability
Expand All @@ -70,3 +84,8 @@ format-citations:
check: ## Run code quality tools.
@echo "--- 🧹 Running code quality tools ---"
@pre-commit run -a

.PHONY: typecheck
typecheck:
@echo "--- 🔍 Running type checks ---"
mypy mteb
Loading