Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
133 commits
Select commit Hold shift + click to select a range
67f7ad9
Add english code retriever model (#3302)
fedor28 Oct 10, 2025
721e8a3
docs: fix typos in `docs/adding_a_benchmark.md` (#3344)
whybe-choi Oct 13, 2025
c07c289
BREAKING: v2.0.0 (#1433)
KennethEnevoldsen Oct 20, 2025
a329381
docs: Update AbsTaskPairClassification to correct path (#3437)
KennethEnevoldsen Oct 20, 2025
ba2a434
breaking: Updating to v2
KennethEnevoldsen Oct 20, 2025
29e605f
test: disable flaky test (#3442)
KennethEnevoldsen Oct 20, 2025
c20226f
feat!: Updating to v2
KennethEnevoldsen Oct 20, 2025
51412db
2.0.0
Oct 20, 2025
838f25f
bump version
KennethEnevoldsen Oct 20, 2025
48b11a7
Merge branch 'main' of https://github.com/embeddings-benchmark/mteb
KennethEnevoldsen Oct 20, 2025
5e6542e
Merge statics (#3452)
Samoed Oct 20, 2025
9b08e8c
Fix: Cache invalidation (#3393)
q275343119 Oct 20, 2025
3af1aa0
ci: Updating docs ci (#3445)
KennethEnevoldsen Oct 20, 2025
a04d78b
fix: add citations to models (#3435) (#3439)
Samoed Oct 20, 2025
3bd272a
2.0.2
Oct 20, 2025
01f3a19
fix: speedup retrieval computation (#3454)
Samoed Oct 20, 2025
325a670
2.0.3
Oct 20, 2025
be20185
docs: Don't shorten embedding size (#3455)
Samoed Oct 20, 2025
38e7bc7
fix: Roll back setting OMP_NUM_THREADS for clustering (#3444)
KennethEnevoldsen Oct 20, 2025
0a6fe95
docs: Update readme header (#3443)
KennethEnevoldsen Oct 20, 2025
6eab159
Update logo size and heading levels in README
KennethEnevoldsen Oct 20, 2025
632b04d
2.0.4
Oct 20, 2025
4ef51f2
Add stats from fzoll (#3460)
Samoed Oct 21, 2025
5c111ea
model: Kalm model (#3461)
Samoed Oct 21, 2025
93d638d
Add more statistics (#3462)
Samoed Oct 21, 2025
4b384bb
fix: vdr category (#3465)
Samoed Oct 21, 2025
428bb42
2.0.5
Oct 21, 2025
4f9f157
feat: Add mteb nl (#3464)
nikolay-banar Oct 21, 2025
5e903b1
docs: Ignore overview docs (#3456)
Samoed Oct 21, 2025
6fbe482
fix: license pyproject (#3457)
Samoed Oct 21, 2025
f8c07ff
fix: Add more impots to root `__init__` (#3458)
Samoed Oct 21, 2025
ced1c1c
2.1.0
Oct 21, 2025
179702e
fix miracl loading (#3466)
Samoed Oct 21, 2025
b649e6f
ci: New release workflow (#3448)
Samoed Oct 22, 2025
0ead029
Almost final descriptive stats (#3463)
Samoed Oct 22, 2025
181490f
Update text_segments.py (#3474)
Muennighoff Oct 22, 2025
5b92f73
Descriptive stats, MIRACLVisionRetrieval (#3473)
fzoll Oct 22, 2025
fcd3b71
Remove skip for tasks (#3475)
Samoed Oct 22, 2025
31c8329
fix: qrels selection negative scores (#3479)
Samoed Oct 23, 2025
dfd516a
docs: fix broken links (#3483)
whybe-choi Oct 23, 2025
ea1bac1
fix: task metadata was not passed in Jina implementation (#3485)
Clement25 Oct 24, 2025
16ae6ff
fix: `top_k` document selection in two stage reranking (#3486)
Samoed Oct 24, 2025
cf81dd1
Correcting the VoyageAI multimodal code (#3491)
fzoll Oct 25, 2025
21223ed
fix: release CI (#3493)
Samoed Oct 25, 2025
9e683fe
Add spell checker (#3476)
Samoed Oct 27, 2025
b0b0e7d
docs: Update links in readme (#3484)
Samoed Oct 27, 2025
799b869
fix: verify languages during filtering (#3472)
Samoed Oct 27, 2025
7b7bdd0
fix: add prompts to hardnegative tasks (#3469)
Samoed Oct 27, 2025
4484112
fix: simplify release (#3494)
Samoed Oct 27, 2025
1325328
fix: Rollback to semantic release (#3502)
Samoed Oct 27, 2025
aaa5d9c
2.1.1
Oct 27, 2025
da9feef
dataset: Add MTEB-NL to the leaderboard (#3489)
nikolay-banar Oct 27, 2025
976fadf
Update links in leaderboard description (#3503)
blockingthesky Oct 27, 2025
8189108
fix ReasonIR instruction (#3506)
whybe-choi Oct 28, 2025
ce07dfd
fix: remove `set_float32_matmul_precision` (#3509)
Samoed Oct 29, 2025
30d03f7
2.1.2
Oct 29, 2025
5eae04c
fix: aggregated task evaluation (#3510)
Samoed Oct 29, 2025
b284128
2.1.3
Oct 29, 2025
b45d37b
Correcting the VoyageAI model URLs and handling empty strings (#3511)
fzoll Oct 30, 2025
fe43f73
fix: reupload winogrande (#3513)
Samoed Oct 30, 2025
2c25793
2.1.4
Oct 30, 2025
d5a8cbe
model: Tarka-Embedding-150M-V1 (#3520)
jaswanth-0821 Nov 4, 2025
08f76a6
fix: materialize corpus id to speed up evaluation (#3518)
nqbao Nov 4, 2025
50a2be3
2.1.5
Nov 4, 2025
a7fb4a9
model: Add `rasgaard/m2v-dfm-large` (#3523)
KennethEnevoldsen Nov 4, 2025
ba5bcb2
fix : Tarka V1 Model revision fix (#3525)
jaswanth-0821 Nov 4, 2025
18ff6f3
model: add EvoQwen2.5-VL-Retriever model (#3526)
liweiqing-ali Nov 6, 2025
28f9c54
model: add kalm_models.py ModelMeta (#3519)
YanshekWoo Nov 6, 2025
632a83a
fix: Add support for python 3.14 (#3450)
Samoed Nov 6, 2025
449b178
2.1.6
Nov 6, 2025
8f3f806
fix: MTEB-NL prompts (#3516)
nikolay-banar Nov 7, 2025
3d4e03a
2.1.7
Nov 7, 2025
d98a008
add citation to models (#3539)
whybe-choi Nov 8, 2025
b2b9599
ci: fix false positive check in typos (#3540)
Samoed Nov 10, 2025
1a02edb
dataset: Benchmark/VidoreV3 (#3514)
QuentinJGMace Nov 10, 2025
57e179d
Add concurrency to tests (#3543)
Samoed Nov 12, 2025
fe83e27
docs: Convert all descriptions to singe line (#3544)
Samoed Nov 12, 2025
27c10e9
Model : Tarka Embedding 350M V1 (#3549)
jaswanth-0821 Nov 12, 2025
eaec6cb
fix: Pass encode kwargs in all dataloaders (#3548)
Samoed Nov 13, 2025
940f897
2.1.8
Nov 13, 2025
1ad433f
model: Added emillykkes scandi models (#3521)
KennethEnevoldsen Nov 13, 2025
ab390ce
fix: Added leaderboard Vidore V3 (#3542)
QuentinJGMace Nov 13, 2025
8b02789
2.1.9
Nov 13, 2025
0c4f099
fix: resolve hash randomization in retrieval task ID generation (#3553)
dongwook92 Nov 13, 2025
85c5ec3
2.1.10
Nov 13, 2025
64459d1
fix: add jasper token compression model (#3557)
DunZhang Nov 14, 2025
26e36cd
fix: MTEB-NL switches to v2 datasets with new prompts (#3555)
nikolay-banar Nov 14, 2025
6837be8
2.1.11
Nov 14, 2025
aeadb64
Merge remote-tracking branch 'upstream/main' into upgrade-mteb
andrejridzik Nov 13, 2025
08b8ec7
fix: Fix adapted from points to the models itself (#3565)
KennethEnevoldsen Nov 15, 2025
78249e5
2.1.12
Nov 15, 2025
711e7cb
fix: Set default input_type for VoyageMultiModalModelWrapper (#3567)
KennethEnevoldsen Nov 15, 2025
b22078e
2.1.13
Nov 15, 2025
07f1e6e
fix: benchmark references links (#3560)
antoineedy Nov 16, 2025
6ed4d58
2.1.14
Nov 16, 2025
723fd98
model: Add spartan8806/atles-champion-embedding model (#3575)
spartan8806 Nov 17, 2025
101088a
Fix false positive check in typos
andrejridzik Nov 18, 2025
76be959
add training code and citation for Jasper_Token_Compression_600M (#3584)
DunZhang Nov 19, 2025
f7b481e
fix: utilize `max_seq_length` (#3588)
Samoed Nov 19, 2025
293b7d9
2.1.15
Nov 19, 2025
4636b24
tests: Added test for ensuring training datasets can be computed (#3…
KennethEnevoldsen Nov 19, 2025
5bca292
add prompt dict for Jasper_Token_Compression_600M (#3587)
DunZhang Nov 20, 2025
0d33bd3
fix: typo for attn_implementation kwargs in jasper models (#3592)
DunZhang Nov 20, 2025
09021df
fix: Bump gradio version to fix links on leaderboard (#3591)
Samoed Nov 20, 2025
9d7d4df
fix: issues on cache hits (#3558)
KennethEnevoldsen Nov 20, 2025
c2a252f
2.1.16
Nov 20, 2025
5d7b78b
fix: improve messages for running missing splits (#3596)
KennethEnevoldsen Nov 20, 2025
79d9a7d
2.1.17
Nov 20, 2025
7f4bcbb
model: Remove `flash_attn` for Tarka (#3599)
Samoed Nov 20, 2025
300418d
model: Add model2vecdk models (#3600)
KennethEnevoldsen Nov 21, 2025
9b898ea
tests: Add tests for dataset quality (#3603)
KennethEnevoldsen Nov 22, 2025
027bc17
add missing citation for Vietnamese retrieval datasets (#3608)
isaac-chung Nov 23, 2025
3af54eb
add note for BUCC tasks about using train split (#3609)
isaac-chung Nov 23, 2025
398b31b
fix: Correcting the cohere lstrip bug in `cohere` (#3610)
fzoll Nov 24, 2025
8212389
2.1.18
Nov 24, 2025
f75bfc4
fix: Cache language filtering (#3612)
Samoed Nov 25, 2025
cfcedfc
model: Add IEITYuan/Yuan-embedding-2.0-zh model (#3613)
wangxj12 Nov 25, 2025
2e106f3
2.1.19
Nov 25, 2025
5010468
feat: make STS and PairClassification asymmetric (#3568)
Samoed Nov 25, 2025
e634ab9
2.2.0
Nov 25, 2025
b0d6c7b
fix: Avoiding stating warning if what is logged is not a warning (#3619)
KennethEnevoldsen Nov 25, 2025
49656a3
2.2.1
Nov 25, 2025
ca8e7c4
fix: vidore loading (#3618)
Samoed Nov 25, 2025
3865565
2.2.2
Nov 25, 2025
7e2fa98
model: Add eager-embed embedding model (#3602)
jpbalarini Nov 26, 2025
73168c6
fix: Updated metadata on model memory (#3624)
Samoed Nov 28, 2025
bcf4e82
ci: update action versions (#3623)
Samoed Nov 28, 2025
392186f
docs: Update "speeding up"-section to include bumping version (#3634)
KennethEnevoldsen Nov 28, 2025
4ed7ef4
feat: add search encoder backend (#3492)
Samoed Nov 28, 2025
4ffef40
ci: Add HF_TOKEN to dataset loading and merge CI (#3622)
KennethEnevoldsen Nov 28, 2025
072e6ef
2.3.0
Nov 28, 2025
230d667
Merge tag '2.3.0' into upgrade-mteb
andrejridzik Dec 11, 2025
a3eebcf
Update Slovak reranking tasks: revise dataset revisions, adjust `main…
andrejridzik Dec 12, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
1 change: 0 additions & 1 deletion .github/ISSUE_TEMPLATE/enhancement.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,3 @@ body:
description: Please provide a clear and concise description of the feature you would like to see added.
validations:
required: true

4 changes: 2 additions & 2 deletions .github/ISSUE_TEMPLATE/eval_request.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ body:
id: contribute
attributes:
label: Are you interested in contributing to the evaluation of this model?
description: By default MTEB maintainters will only handle evaluation on private subsets due to resource constraints. If you are interested in contributing to the evaluation, please let us know.
description: By default MTEB maintainers will only handle evaluation on private subsets due to resource constraints. If you are interested in contributing to the evaluation, please let us know.
options:
- "Yes"
- "No"
Expand All @@ -30,4 +30,4 @@ body:
description: If you are unsure, please check using mteb model registry (e.g. using `mteb.get_model_meta("model_id")`).
options:
- "Yes"
- "No"
- "No"
29 changes: 18 additions & 11 deletions .github/workflows/dataset_loading.yml
Original file line number Diff line number Diff line change
@@ -1,27 +1,34 @@
name: Datasets available on HuggingFace

on:
push:
branches: [main]
pull_request:
paths:
- "mteb/tasks/**.py"

jobs:
extract-and-run:
dataset-loading-check:
runs-on: ubuntu-latest

steps:
- name: Checkout repository
uses: actions/checkout@v3

uses: actions/checkout@v6
- name: Set up Python
uses: actions/setup-python@v4
uses: actions/setup-python@v6
with:
python-version: '3.10'
python-version: '3.11'
cache: 'pip'

- name: Install dependencies
run: |
make install-for-tests

make install-for-tests
- name: Run dataset loading tests
env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
run: |
make dataset-load-test
if [ "${{ github.event_name }}" == "pull_request" ]; then
make dataset-load-test-pr BASE_BRANCH=${{ github.event.pull_request.base.ref }}
else
make dataset-load-test
fi
28 changes: 0 additions & 28 deletions .github/workflows/dataset_loading_pr.yml

This file was deleted.

63 changes: 0 additions & 63 deletions .github/workflows/docs.yml

This file was deleted.

33 changes: 33 additions & 0 deletions .github/workflows/documentation.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
name: Documentation

on:
push:
branches: [main]
pull_request:

permissions:
contents: write

jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions/setup-python@v6
with:
python-version: "3.10"

- name: Dependencies
run: |
python -m pip install --upgrade pip
pip install -e . --group docs

- name: Build and Deploy
if: github.event_name == 'push'
run: |
make build-docs-overview
mkdocs gh-deploy --force

- name: Build
if: github.event_name == 'pull_request'
run: make build-docs
7 changes: 3 additions & 4 deletions .github/workflows/leaderboard_build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,18 +4,17 @@ on:
push:
branches: [main]
pull_request:
branches: [main]

jobs:
leaderboard:
runs-on: ubuntu-latest

steps:
- name: Checkout repository
uses: actions/checkout@v3
uses: actions/checkout@v6

- name: Set up Python
uses: actions/setup-python@v4
uses: actions/setup-python@v6
with:
python-version: '3.10'
cache: 'pip'
Expand All @@ -26,4 +25,4 @@ jobs:

- name: Run leaderboard build test
run: |
make leaderboard-build-test
make leaderboard-build-test
4 changes: 2 additions & 2 deletions .github/workflows/lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,9 @@ jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v6

- uses: actions/setup-python@v4
- uses: actions/setup-python@v6
with:
python-version: "3.10"
cache: "pip"
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/model_loading.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,10 @@ jobs:

steps:
- name: Checkout repository
uses: actions/checkout@v3
uses: actions/checkout@v6

- name: Set up Python
uses: actions/setup-python@v4
uses: actions/setup-python@v6
with:
python-version: "3.10"
cache: "pip"
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ jobs:

if: ${{ github.ref == 'refs/heads/main' && github.event.workflow_run.conclusion == 'success'}}
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v6
with:
fetch-depth: 0
token: ${{ secrets.RELEASE }}
Expand Down
14 changes: 9 additions & 5 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,21 +8,25 @@ on:
branches: [main]
pull_request:

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

jobs:
test:
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest] #, macos-latest, windows-latest]
python-version: ["3.9", "3.10", "3.11", "3.12"]
python-version: ["3.10", "3.11", "3.12", "3.13", "3.14"]
include:
# Add Windows with Python 3.8 only to avoid tests taking too long
# Add Windows with Python 3.10 only to avoid tests taking too long
- os: windows-latest
python-version: "3.9"
python-version: "3.10"

steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v6

- name: Cache Hugging Face
id: cache-hf
Expand All @@ -32,7 +36,7 @@ jobs:
key: ${{ runner.os }}-hf

- name: Setup Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
uses: actions/setup-python@v6
with:
python-version: ${{ matrix.python-version }}
cache: "pip"
Expand Down
11 changes: 10 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,6 @@ sb.ipynb
tests/create_meta/model_card.md

# removed results from mteb repo they are now available at: https://github.com/embeddings-benchmark/results
results/
uv.lock

# model loading tests
Expand All @@ -152,3 +151,13 @@ mteb/leaderboard/__cached_results.json

# gradio
.gradio/

# codecarbon
powermetrics_log.txt

# vscode
.vscode/launch.json

/docs/overview/available_models/
/docs/overview/available_tasks/
/docs/overview/available_benchmarks.md
5 changes: 5 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,11 @@ repos:
- id: end-of-file-fixer # generated a lot of changes
- id: trailing-whitespace
- id: check-toml
- repo: https://github.com/crate-ci/typos
rev: v1.38.1
hooks:
- id: typos
args: ["--diff"]

- repo: local
hooks:
Expand Down
28 changes: 22 additions & 6 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -6,18 +6,20 @@ install:
install-for-tests:
@echo "--- 🚀 Installing project dependencies for test ---"
@echo "This ensures that the project is not installed in editable mode"
pip install ".[image]" --group dev
pip install ".[bm25s,pylate,image,codecarbon,faiss-cpu]" --group dev

lint:
@echo "--- 🧹 Running linters ---"
ruff format . # running ruff formatting
ruff check . --fix --exit-non-zero-on-fix # running ruff linting # --exit-non-zero-on-fix is used for the pre-commit hook to work
typos

lint-check:
@echo "--- 🧹 Check is project is linted ---"
# Required for CI to work, otherwise it will just pass
ruff format . --check # running ruff formatting
ruff check . # running ruff linting
typos --diff

test:
@echo "--- 🧪 Running tests ---"
Expand All @@ -33,12 +35,21 @@ pr:
make lint
make test


build-docs:
build-docs: build-docs-overview
@echo "--- 📚 Building documentation ---"
# since we do not have a documentation site, this just build tables for the .md files
python docs/create_tasks_table.py
python docs/create_benchmarks_table.py
python -m mkdocs build


build-docs-overview:
@echo "--- 📚 Building documentation overview ---"
python docs/overview/create_available_tasks.py
python docs/overview/create_available_models.py
python docs/overview/create_available_benchmarks.py


serve-docs:
@echo "--- 📚 Serving documentation ---"
python -m mkdocs serve


model-load-test:
Expand Down Expand Up @@ -74,3 +85,8 @@ format-citations:
check: ## Run code quality tools.
@echo "--- 🧹 Running code quality tools ---"
@pre-commit run -a

.PHONY: typecheck
typecheck:
@echo "--- 🔍 Running type checks ---"
mypy mteb
Loading
Loading