Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offline Evaluations on MTEB #11

Open
SouLeo opened this issue May 9, 2024 · 1 comment
Open

Offline Evaluations on MTEB #11

SouLeo opened this issue May 9, 2024 · 1 comment

Comments

@SouLeo
Copy link

SouLeo commented May 9, 2024

Hello, I'm trying to run MTEB on a cluster without internet access, but I am struggling. Here are the following instructions I've followed:

  1. $ pip install mteb
  2. $ !pip install --upgrade git+https://github.com/Muennighoff/mteb.git@offlineaccess
  3. Install banking77 dataset from huggingface: $ git clone [email protected]:datasets/mteb/banking77

I then run:

import torch
from llm2vec import LLM2Vec
from mteb import MTEB


l2v = LLM2Vec.from_pretrained(
    "<path to model>/llama3-llm2vec",
    peft_model_name_or_path="<path to model>/llama3-llm2vec-unsup",
    attn_implementation="flash_attention_2",
    device_map="cuda" if torch.cuda.is_available() else "cpu",
    torch_dtype=torch.bfloat16,
)


MODEL_NAME = "llama3-llm2vec"
evaluation = MTEB(tasks=["Banking77Classification"])
results = evaluation.run(l2v, output_folder=f"results/{MODEL_NAME}")

with the following environment variables set in my .bashrc:

export HF_HOME="<path to model>/cache/"
export HF_DATASETS_CACHE="<path to data>/big_data/hf/"
export TRANSFORMERS_CACHE="<path to model>/cache/"
export HF_HUB_OFFLINE=1 # 1 means offline.
export HF_DATASETS_OFFLINE=1
export TRANSFORMERS_OFFLINE=1

But I get this error:

Error while evaluating Banking77Classification: Couldn't find a dataset script at <path to data>/mteb/banking77/banking77.py or any data file in the same directory.

I'm not sure where I obtain the script: banking77.py or if I am properly following the offline evaluation instructions. Any help would be immensely appreciated.

@Muennighoff
Copy link
Collaborator

Hey sorry https://github.com/Muennighoff/mteb.git@offlineaccess is very outdated at this point. I recommend you to just use the regular MTEB and make the necessary changes to make it work offline (which would be similar to the changes made in https://github.com/Muennighoff/mteb.git@offlineaccess). I think it should be very straightforward. It would be amazing if you could open a PR to allow offline evaluation, too, so we have it in the main mteb. 🙌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants