MTEB Paper resources

This repository contains scripts & resources for the MTEB paper. Some scripts rely on a results folder, which can be obtained via git clone https://huggingface.co/datasets/mteb/results. These scripts are unlikely to work with the latest version of MTEB but rather the 1.0.0 release when the paper was released; they are solely to ease reproduction of the original paper. Please refer to the MTEB repository for scripts and resources to work with the latest version and please open any issues with MTEB there; if you have issues with the original MTEB paper you can open them here.

Talks
Benchmark
Env Setup
Model setup
- Download
- Load

Talks

Link to 12min presentation on MTEB by Niklas Muennighoff
Link to 5min presentation on MTEB by Nils Reimers

Benchmark

Basic with Internet

from mteb import MTEB
from sentence_transformers import SentenceTransformer
model_path = "/gpfswork/rech/six/commun/models/Muennighoff_SGPT-125M-weightedmean-nli-bitfit"
model_name = model_path.split("/")[-1].split("_")[-1]
model = SentenceTransformer(model_path)
evaluation = MTEB(tasks=["Banking77Classification"])
evaluation.run(model, output_folder=f"results/{model_name}")

No Internet Access (Download data first)

import os
os.environ["HF_DATASETS_OFFLINE"]="1" # 1 for offline
os.environ["TRANSFORMERS_OFFLINE"]="1" # 1 for offline
os.environ["TRANSFORMERS_CACHE"]="/gpfswork/rech/six/commun/models"
os.environ["HF_DATASETS_CACHE"]="/gpfswork/rech/six/commun/datasets"
os.environ["HF_MODULES_CACHE"]="/gpfswork/rech/six/commun/modules"
os.environ["HF_METRICS_CACHE"]="/gpfswork/rech/six/commun/metrics"
from mteb import MTEB
from sentence_transformers import SentenceTransformer
model_path = "/gpfswork/rech/six/commun/models/Muennighoff_SGPT-125M-weightedmean-nli-bitfit"
model_name = model_path.split("/")[-1].split("_")[-1]
model = SentenceTransformer(model_path)
evaluation = MTEB(tasks=["Banking77Classification"])
evaluation.run(model, output_folder=f"results/{model_name}")

Env Setup

export CONDA_ENVS_PATH=$six_ALL_CCFRWORK/conda

conda create -y -n hf-prod python=3.8
conda activate hf-prod

# pt-1.10.1 / cuda 11.3
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch

# Custom fork that uses offline datasets
!pip install --upgrade git+https://github.com/Muennighoff/mteb.git@offlineaccess
!pip install --upgrade git+https://github.com/Muennighoff/sentence-transformers.git@sgpt_poolings
# If you want to run BEIR tasks
!pip install --upgrade git+https://github.com/beir-cellar/beir.git

Model setup

Download

import os
import sentence_transformers
os.environ["SENTENCE_TRANSFORMERS_HOME"] = "/gpfswork/rech/six/commun/models"
sentence_transformers_cache_dir = os.getenv("SENTENCE_TRANSFORMERS_HOME")
model_repo="sentence-transformers/allenai-specter"
revision="29f9f45ff2a85fe9dfe8ce2cef3d8ec4e65c5f37"
model_path = os.path.join(sentence_transformers_cache_dir, model_repo.replace("/", "_"))
model_path_tmp = sentence_transformers.util.snapshot_download(
    repo_id=model_repo,
    revision=revision,
    cache_dir=sentence_transformers_cache_dir,
    library_name="sentence-transformers",
    library_version=sentence_transformers.__version__,
    ignore_files=["flax_model.msgpack", "rust_model.ot", "tf_model.h5",],
)
os.rename(model_path_tmp, model_path)

Load

model = SentenceTransformer("/gpfswork/rech/six/commun/models/Muennighoff_SGPT-125M-weightedmean-nli-bitfit")

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
paper		paper
plotstables		plotstables
slurmscripts		slurmscripts
.gitignore		.gitignore
README.md		README.md
download_tasks.py		download_tasks.py
fix_results.py		fix_results.py
results_to_csv.bash		results_to_csv.bash
results_to_csv.py		results_to_csv.py
run_array.py		run_array.py
run_array_laser.py		run_array_laser.py
run_array_openai.py		run_array_openai.py
run_array_openaiv2.py		run_array_openaiv2.py
run_array_openaiv3.py		run_array_openaiv3.py
run_array_sgpt.py		run_array_sgpt.py
run_array_simcse.py		run_array_simcse.py
run_benchmark.py		run_benchmark.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MTEB Paper resources

Talks

Benchmark

Env Setup

Model setup

Download

Load

About

Releases

Packages

Contributors 2

Languages

embeddings-benchmark/mtebpaper

Folders and files

Latest commit

History

Repository files navigation

MTEB Paper resources

Talks

Benchmark

Env Setup

Model setup

Download

Load

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages