[refactor] model loading - no more unnecessary file downloads#2345
[refactor] model loading - no more unnecessary file downloads#2345tomaarsen merged 9 commits intohuggingface:masterfrom
[refactor] model loading - no more unnecessary file downloads#2345Conversation
Deprecated arguments are not listed in docstrings
[refactor] model loading - no more unnecessary file downloads
bwanglzu
left a comment
There was a problem hiding this comment.
left some very minor comments, do you think it make sense, at some point to refactor tests in pytests? i personally find it much more effective than unittest
|
I also prefer
|
|
Somebody for the love of god, please merge this and update pypi |
|
THANK YOU |
|
@Sirri69 I'm on it 😉 Give it a few days. I made updates to introduce better support if Internet is unavailable. Now, we can run the following script under various settings: from sentence_transformers import SentenceTransformer
model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
embeddings = model.encode("This is a test sentence", normalize_embeddings=True)
print(embeddings.shape)These are now the outputs under the various settings:
This is exactly what I would hope to get. cc: @nreimers as we discussed this.
|
|
Hi, I appreciate this update to support model loading without an internet connection. However, I find that loading the model is very slow without an internet connection. My testing code is as follows: import time
start = time.time()
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("nomic-ai/nomic-embed-text-v1", trust_remote_code=True, device='cpu')
emb = model.encode(["hello world"])
print(emb.shape)
print('time:', time.time()-start)The output is as follows: Additionally, I found that adding the import time
start = time.time()
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("nomic-ai/nomic-embed-text-v1", trust_remote_code=True, device='cpu', local_files_only=True)
emb = model.encode(["hello world"])
print(emb.shape)
print('time:', time.time()-start)
# output:
# <All keys matched successfully>
# (1, 768)
# time: 145.69492316246033 |
Hello!
Pull Request overview
hf_hub_download.cached_download.use_auth_tokenin favor oftokenas required by recenttransformers/huggingface_hubversions.Details
In short, model downloading has moved from greedy full repository downloading to lazy per-module downloading, where no files are downloaded for
Transformersmodules.Original model loading steps
modules.jsonexists.Transformerusing the local files downloaded in the last step +Pooling.New model loading steps
modules.jsonexists locally or on the Hub.a. Download the ST configuration files (
'config_sentence_transformers.json','README.md','modules.json') if they're remote.b. For each module, if it is not transformers, then download (if necessary) the directory with configuration/weights for that module. If it is transformers, then do not download & load the model using the
model_name_or_path.Transformerusing themodel_name_or_path+Pooling.With this changed setup, we defer downloading any
transformersdata totransformersitself. In a test model that I uploaded with bothpytorch_model.binandmodel.safetensors, only the safetensors file is loaded. This is verified in the attached test case.Additional changes
As required by
huggingface_hub, we now usetokeninstead ofuse_auth_token. Ifuse_auth_tokenis still provided, thentoken = use_auth_tokenis set, and a warning is given. I.e. a soft deprecation.