Skip to content

[refactor] model loading - no more unnecessary file downloads#2345

Merged
tomaarsen merged 9 commits intohuggingface:masterfrom
tomaarsen:feat/efficient_loading
Dec 12, 2023
Merged

[refactor] model loading - no more unnecessary file downloads#2345
tomaarsen merged 9 commits intohuggingface:masterfrom
tomaarsen:feat/efficient_loading

Conversation

@tomaarsen
Copy link
Copy Markdown
Member

Hello!

Pull Request overview

  • Refactor the model loading;
    • No longer download the full model repository.
    • Update cache format to git style via hf_hub_download.
    • No longer use deprecated cached_download.
    • Soft deprecation of use_auth_token in favor of token as required by recent transformers/huggingface_hub versions.
  • Add test to ensure that correct/appropriate files are downloaded.

Details

In short, model downloading has moved from greedy full repository downloading to lazy per-module downloading, where no files are downloaded for Transformers modules.

Original model loading steps

  1. Greedily load the full model repository to the cache folder.
  2. Check if modules.json exists.
  3. If so, load all modules individually using the local files downloaded in the last step.
  4. If not, load Transformer using the local files downloaded in the last step + Pooling.
  5. Done

New model loading steps

  1. Check if modules.json exists locally or on the Hub.
  2. If so,
    a. Download the ST configuration files ('config_sentence_transformers.json', 'README.md', 'modules.json') if they're remote.
    b. For each module, if it is not transformers, then download (if necessary) the directory with configuration/weights for that module. If it is transformers, then do not download & load the model using the model_name_or_path.
  3. If not, load Transformer using the model_name_or_path + Pooling.
  4. Done

With this changed setup, we defer downloading any transformers data to transformers itself. In a test model that I uploaded with both pytorch_model.bin and model.safetensors, only the safetensors file is loaded. This is verified in the attached test case.

Additional changes

As required by huggingface_hub, we now use token instead of use_auth_token. If use_auth_token is still provided, then token = use_auth_token is set, and a warning is given. I.e. a soft deprecation.

  • Tom Aarsen

@tomaarsen tomaarsen changed the title Refactor model loading - no more unnecessary file downloads [refactor] model loading - no more unnecessary file downloads Nov 7, 2023
Copy link
Copy Markdown

@bwanglzu bwanglzu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left some very minor comments, do you think it make sense, at some point to refactor tests in pytests? i personally find it much more effective than unittest

@tomaarsen
Copy link
Copy Markdown
Member Author

I also prefer pytest. I would indeed like to fully refactor the tests and heavily improve them. The current coverage is quite low for my tastes! Thanks for the review by the way!

  • Tom Aarsen

@Sirri69
Copy link
Copy Markdown

Sirri69 commented Dec 11, 2023

Somebody for the love of god, please merge this and update pypi

@Sirri69
Copy link
Copy Markdown

Sirri69 commented Dec 12, 2023

THANK YOU

@tomaarsen
Copy link
Copy Markdown
Member Author

@Sirri69 I'm on it 😉 Give it a few days.

I made updates to introduce better support if Internet is unavailable. Now, we can run the following script under various settings:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
embeddings = model.encode("This is a test sentence", normalize_embeddings=True)
print(embeddings.shape)

These are now the outputs under the various settings:

Internet No Internet
Cache (384,) (384,)
No Cache modules.json: 100%|█████████████████████████████████| 349/349 [00:00<?, ?B/s]
config_sentence_transformers.json: 100%|████████████| 116/116 [00:00<?, ?B/s]
README.md: 100%|████████████████████████████████| 10.6k/10.6k [00:00<?, ?B/s]
sentence_bert_config.json: 100%|██████████████████| 53.0/53.0 [00:00<?, ?B/s]
config.json: 100%|██████████████████████████████████| 612/612 [00:00<?, ?B/s]
pytorch_model.bin: 100%|████████████████| 90.9M/90.9M [00:06<00:00, 14.9MB/s]
tokenizer_config.json: 100%|████████████████████████| 350/350 [00:00<?, ?B/s]
vocab.txt: 100%|██████████████████████████| 232k/232k [00:00<00:00, 1.36MB/s]
tokenizer.json: 100%|█████████████████████| 466k/466k [00:00<00:00, 4.97MB/s]
special_tokens_map.json: 100%|██████████████| 112/112 [00:00<00:00, 90.1kB/s]
1_Pooling/config.json: 100%|████████████████████████| 190/190 [00:00<?, ?B/s]
(384,)
OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like sentence-transformers/all-MiniLM-L6-v2 is not the path to a directory containing a file named config.json. Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

This is exactly what I would hope to get.

cc: @nreimers as we discussed this.

  • Tom Aarsen

@peiyangL
Copy link
Copy Markdown

@tomaarsen

Hi, I appreciate this update to support model loading without an internet connection.

However, I find that loading the model is very slow without an internet connection. My testing code is as follows:

import time
start = time.time()
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("nomic-ai/nomic-embed-text-v1", trust_remote_code=True, device='cpu')
emb = model.encode(["hello world"])
print(emb.shape)
print('time:', time.time()-start)

The output is as follows:

# without internet
<All keys matched successfully>
(1, 768)
time: 376.90756702423096

# with internet
<All keys matched successfully>
(1, 768)
time: 15.75501823425293

Additionally, I found that adding the local_files_only=True parameter speeds up model loading without an internet connection, but it is still quite slow.

import time
start = time.time()
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("nomic-ai/nomic-embed-text-v1", trust_remote_code=True, device='cpu', local_files_only=True)
emb = model.encode(["hello world"])
print(emb.shape)
print('time:', time.time()-start)

# output:
# <All keys matched successfully>
# (1, 768)
# time: 145.69492316246033

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants