`[refactor]` model loading - no more unnecessary file downloads by tomaarsen · Pull Request #2345 · huggingface/sentence-transformers

tomaarsen · 2023-11-06T13:59:26Z

Hello!

Pull Request overview

Refactor the model loading;
- No longer download the full model repository.
- Update cache format to git style via hf_hub_download.
- No longer use deprecated cached_download.
- Soft deprecation of use_auth_token in favor of token as required by recent transformers/huggingface_hub versions.
Add test to ensure that correct/appropriate files are downloaded.

Details

In short, model downloading has moved from greedy full repository downloading to lazy per-module downloading, where no files are downloaded for Transformers modules.

Original model loading steps

Greedily load the full model repository to the cache folder.
Check if modules.json exists.
If so, load all modules individually using the local files downloaded in the last step.
If not, load Transformer using the local files downloaded in the last step + Pooling.
Done

New model loading steps

Check if modules.json exists locally or on the Hub.
If so,
a. Download the ST configuration files ('config_sentence_transformers.json', 'README.md', 'modules.json') if they're remote.
b. For each module, if it is not transformers, then download (if necessary) the directory with configuration/weights for that module. If it is transformers, then do not download & load the model using the model_name_or_path.
If not, load Transformer using the model_name_or_path + Pooling.
Done

With this changed setup, we defer downloading any transformers data to transformers itself. In a test model that I uploaded with both pytorch_model.bin and model.safetensors, only the safetensors file is loaded. This is verified in the attached test case.

Additional changes

As required by huggingface_hub, we now use token instead of use_auth_token. If use_auth_token is still provided, then token = use_auth_token is set, and a warning is given. I.e. a soft deprecation.

Tom Aarsen

Deprecated arguments are not listed in docstrings

sentence_transformers/SentenceTransformer.py

sentence_transformers/util.py

bwanglzu

left some very minor comments, do you think it make sense, at some point to refactor tests in pytests? i personally find it much more effective than unittest

tomaarsen · 2023-11-14T13:47:34Z

I also prefer pytest. I would indeed like to fully refactor the tests and heavily improve them. The current coverage is quite low for my tastes! Thanks for the review by the way!

Tom Aarsen

Sirri69 · 2023-12-11T19:03:05Z

Somebody for the love of god, please merge this and update pypi

Sirri69 · 2023-12-12T09:24:55Z

THANK YOU

tomaarsen · 2023-12-12T09:29:25Z

@Sirri69 I'm on it 😉 Give it a few days.

I made updates to introduce better support if Internet is unavailable. Now, we can run the following script under various settings:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
embeddings = model.encode("This is a test sentence", normalize_embeddings=True)
print(embeddings.shape)

These are now the outputs under the various settings:

	Internet	No Internet
Cache	`(384,)`	`(384,)`
No Cache	`modules.json: 100%\|█████████████████████████████████\| 349/349 [00:00<?, ?B/s]` `config_sentence_transformers.json: 100%\|████████████\| 116/116 [00:00<?, ?B/s]` `README.md: 100%\|████████████████████████████████\| 10.6k/10.6k [00:00<?, ?B/s]` `sentence_bert_config.json: 100%\|██████████████████\| 53.0/53.0 [00:00<?, ?B/s]` `config.json: 100%\|██████████████████████████████████\| 612/612 [00:00<?, ?B/s]` `pytorch_model.bin: 100%\|████████████████\| 90.9M/90.9M [00:06<00:00, 14.9MB/s]` `tokenizer_config.json: 100%\|████████████████████████\| 350/350 [00:00<?, ?B/s]` `vocab.txt: 100%\|██████████████████████████\| 232k/232k [00:00<00:00, 1.36MB/s]` `tokenizer.json: 100%\|█████████████████████\| 466k/466k [00:00<00:00, 4.97MB/s]` `special_tokens_map.json: 100%\|██████████████\| 112/112 [00:00<00:00, 90.1kB/s]` `1_Pooling/config.json: 100%\|████████████████████████\| 190/190 [00:00<?, ?B/s]` `(384,)`	OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like sentence-transformers/all-MiniLM-L6-v2 is not the path to a directory containing a file named config.json. Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

This is exactly what I would hope to get.

cc: @nreimers as we discussed this.

Tom Aarsen

into feat/efficient_loading

peiyangL · 2024-07-25T05:49:19Z

@tomaarsen

Hi, I appreciate this update to support model loading without an internet connection.

However, I find that loading the model is very slow without an internet connection. My testing code is as follows:

import time
start = time.time()
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("nomic-ai/nomic-embed-text-v1", trust_remote_code=True, device='cpu')
emb = model.encode(["hello world"])
print(emb.shape)
print('time:', time.time()-start)

The output is as follows:

# without internet
<All keys matched successfully>
(1, 768)
time: 376.90756702423096

# with internet
<All keys matched successfully>
(1, 768)
time: 15.75501823425293

Additionally, I found that adding the local_files_only=True parameter speeds up model loading without an internet connection, but it is still quite slow.

import time
start = time.time()
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("nomic-ai/nomic-embed-text-v1", trust_remote_code=True, device='cpu', local_files_only=True)
emb = model.encode(["hello world"])
print(emb.shape)
print('time:', time.time()-start)

# output:
# <All keys matched successfully>
# (1, 768)
# time: 145.69492316246033

tomaarsen added 3 commits November 6, 2023 13:22

Refactor model loading: no full repo download

4bf9e99

Add simple test regarding efficient loading

31646a9

Replace use_auth_token with token in docstring

a1a1cd7

Deprecated arguments are not listed in docstrings

tomaarsen mentioned this pull request Nov 7, 2023

Edit transformers requirement to be <4.35.0 #2348

Closed

tomaarsen changed the title ~~Refactor model loading - no more unnecessary file downloads~~ [refactor] model loading - no more unnecessary file downloads Nov 7, 2023

This was referenced Nov 7, 2023

Use git-aware cache file layout #2339

Closed

[ci] Simplify tests, add CI, patch paraphrase_mining_embeddings #2350

Merged

Does sentence-transformers prefer safetensors when available? #2293

Closed

bwanglzu reviewed Nov 14, 2023

View reviewed changes

sentence_transformers/SentenceTransformer.py Outdated Show resolved Hide resolved

bwanglzu reviewed Nov 14, 2023

View reviewed changes

sentence_transformers/SentenceTransformer.py Show resolved Hide resolved

bwanglzu reviewed Nov 14, 2023

View reviewed changes

sentence_transformers/util.py Outdated Show resolved Hide resolved

bwanglzu reviewed Nov 14, 2023

View reviewed changes

tomaarsen mentioned this pull request Dec 7, 2023

[deps] Add safetensors dependency, see #2372 #2373

Closed

tomaarsen added 2 commits December 12, 2023 10:11

Prevent crash if internet is down

e1ca408

Merge branch 'master' into feat/efficient_loading

7618a4f

tomaarsen added 4 commits December 12, 2023 10:33

Use load_file_path in "is_sbert_model"

f26ba94

Merge branch 'master' of https://github.com/UKPLab/sentence-transformers

a00482f

into feat/efficient_loading

Merge branch 'master' into feat/efficient_loading

033bf6d

Merge branch 'master' into feat/efficient_loading

255e828

tomaarsen merged commit 331549c into huggingface:master Dec 12, 2023

tomaarsen deleted the feat/efficient_loading branch December 12, 2023 12:02

This was referenced Dec 13, 2023

Unable to load local trained cross encoder model #2372

Closed

use cache even if module.json is not present in the cache. #1923

Closed

Offline mode #1725

Closed

Add offline mode #1845

Closed

liuxueyang mentioned this pull request Jan 2, 2024

Support custom huggingface and sentence-transformers cache postgresml/postgresml#1146

Open

tomaarsen mentioned this pull request Jan 17, 2024

Add optional model_revision to SentenceTransformer #1645

Closed

pseudotensor mentioned this pull request May 13, 2024

Run docker image on any machine which haven't internet connection h2oai/h2ogpt#1602

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`[refactor]` model loading - no more unnecessary file downloads#2345

`[refactor]` model loading - no more unnecessary file downloads#2345
tomaarsen merged 9 commits intohuggingface:masterfrom
tomaarsen:feat/efficient_loading

tomaarsen commented Nov 6, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bwanglzu left a comment

Uh oh!

tomaarsen commented Nov 14, 2023

Uh oh!

Sirri69 commented Dec 11, 2023

Uh oh!

Sirri69 commented Dec 12, 2023

Uh oh!

tomaarsen commented Dec 12, 2023

Uh oh!

peiyangL commented Jul 25, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

tomaarsen commented Nov 6, 2023

Pull Request overview

Details

Original model loading steps

New model loading steps

Additional changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bwanglzu left a comment

Choose a reason for hiding this comment

Uh oh!

tomaarsen commented Nov 14, 2023

Uh oh!

Sirri69 commented Dec 11, 2023

Uh oh!

Sirri69 commented Dec 12, 2023

Uh oh!

tomaarsen commented Dec 12, 2023

Uh oh!

peiyangL commented Jul 25, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants