Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HuggingFaceEmbeddings takes the default model name and reinitializes self.client even while passing in client parameter. #27505

Closed
5 tasks done
PraNavKumAr01 opened this issue Oct 21, 2024 · 2 comments
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature

Comments

@PraNavKumAr01
Copy link

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

from langchain_huggingface import HuggingFaceEmbeddings
from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
   "all-MiniLM-L6-v2",
   backend="onnx",
   model_kwargs={
      "file_name": "model_qint8_avx512_vnni.onnx",
      "provider" : "CPUExecutionProvider"
  },
)

embeddings = HuggingFaceEmbeddings(client = model)

Error Message and Stack Trace (if applicable)

No response

Description

The HuggingFaceEmbedding class has a client parameter, which lets you pass in a pre-loaded model. But right now in the code, even after passing the client parameter, it has a DEFAULT_MODEL_NAME = "sentence-transformers/all-mpnet-base-v2" set and it starts downloading a loading that model
The reason for this is i think that

self.client = sentence_transformers.SentenceTransformer(
                self.model_name, cache_folder=self.cache_folder, **self.model_kwargs
            )

The self.client is initialized without checking if client has been passed into it or not, and regardless creates a new client loading the default model

This can be easily fixed by doing this

if not self.client: # Add a check to see if self.client has been passed, if yes no need to reinitialize the client
            self.client = sentence_transformers.SentenceTransformer(
                self.model_name, cache_folder=self.cache_folder, **self.model_kwargs
            )

Ill be happy to open a pr, fix this small issue then merge. Let me know how to proceed with this.

System Info

System Information

OS: Darwin
OS Version: Darwin Kernel Version 23.5.0: Wed May 1 20:19:05 PDT 2024; root:xnu-10063.121.3~5/RELEASE_ARM64_T8112
Python Version: 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:41:52) [Clang 15.0.7 ]

Package Information

langchain_core: 0.3.12
langchain: 0.2.0
langchain_community: 0.0.38
langsmith: 0.1.136
langchain_groq: 0.1.4
langchain_huggingface: 0.1.0
langchain_text_splitters: 0.2.0
langgraph: 0.0.50

Optional packages not installed

langserve

Other Dependencies

aiohttp: 3.9.3
aiosqlite: Installed. No version info available.
aleph-alpha-client: Installed. No version info available.
anthropic: Installed. No version info available.
arxiv: Installed. No version info available.
assemblyai: Installed. No version info available.
async-timeout: 4.0.3
atlassian-python-api: Installed. No version info available.
azure-ai-documentintelligence: Installed. No version info available.
azure-ai-formrecognizer: Installed. No version info available.
azure-ai-textanalytics: Installed. No version info available.
azure-cognitiveservices-speech: Installed. No version info available.
azure-core: Installed. No version info available.
azure-cosmos: Installed. No version info available.
azure-identity: Installed. No version info available.
azure-search-documents: Installed. No version info available.
beautifulsoup4: 4.12.3
bibtexparser: Installed. No version info available.
cassio: Installed. No version info available.
chardet: Installed. No version info available.
clarifai: Installed. No version info available.
cloudpickle: 2.2.1
cohere: Installed. No version info available.
couchbase: Installed. No version info available.
dashvector: Installed. No version info available.
databricks-vectorsearch: Installed. No version info available.
dataclasses-json: 0.6.6
datasets: Installed. No version info available.
dgml-utils: Installed. No version info available.
docarray[hnswlib]: Installed. No version info available.
elasticsearch: Installed. No version info available.
esprima: Installed. No version info available.
faiss-cpu: Installed. No version info available.
feedparser: Installed. No version info available.
fireworks-ai: Installed. No version info available.
friendli-client: Installed. No version info available.
geopandas: Installed. No version info available.
gitpython: 3.1.32
google-cloud-documentai: Installed. No version info available.
gql: Installed. No version info available.
gradientai: Installed. No version info available.
groq: 0.5.0
hdbcli: Installed. No version info available.
hologres-vector: Installed. No version info available.
html2text: Installed. No version info available.
httpx: 0.27.0
httpx-sse: Installed. No version info available.
huggingface-hub: 0.26.0
huggingface_hub: 0.26.0
javelin-sdk: Installed. No version info available.
jinja2: 3.1.2
jq: Installed. No version info available.
jsonpatch: 1.33
jsonschema: 4.17.3
langchain-openai: Installed. No version info available.
lxml: 5.2.2
manifest-ml: Installed. No version info available.
markdownify: Installed. No version info available.
motor: Installed. No version info available.
msal: Installed. No version info available.
mwparserfromhell: Installed. No version info available.
mwxml: Installed. No version info available.
newspaper3k: Installed. No version info available.
nlpcloud: Installed. No version info available.
numexpr: 2.8.4
numpy: 1.25.2
nvidia-riva-client: Installed. No version info available.
oci: Installed. No version info available.
openai: Installed. No version info available.
openapi-pydantic: Installed. No version info available.
openlm: Installed. No version info available.
oracle-ads: Installed. No version info available.
oracledb: Installed. No version info available.
orjson: 3.10.3
packaging: 23.2
pandas: 1.5.3
pdfminer-six: Installed. No version info available.
pgvector: Installed. No version info available.
praw: Installed. No version info available.
premai: Installed. No version info available.
psychicapi: Installed. No version info available.
py-trello: Installed. No version info available.
pydantic: 2.8.2
pyjwt: Installed. No version info available.
pymupdf: Installed. No version info available.
pypdf: 3.17.4
pypdfium2: Installed. No version info available.
pyspark: Installed. No version info available.
PyYAML: 6.0.1
qdrant-client: Installed. No version info available.
rank-bm25: Installed. No version info available.
rapidfuzz: Installed. No version info available.
rapidocr-onnxruntime: Installed. No version info available.
rdflib: Installed. No version info available.
requests: 2.31.0
requests-toolbelt: 1.0.0
rspace_client: Installed. No version info available.
scikit-learn: 1.3.0
sentence-transformers: 3.2.0
SQLAlchemy: 2.0.30
sqlite-vss: Installed. No version info available.
streamlit: Installed. No version info available.
sympy: 1.12
telethon: Installed. No version info available.
tenacity: 8.3.0
tidb-vector: Installed. No version info available.
tiktoken: Installed. No version info available.
timescale-vector: Installed. No version info available.
tokenizers: 0.20.1
torch: 2.0.1
tqdm: 4.65.0
transformers: 4.45.2
tree-sitter: Installed. No version info available.
tree-sitter-languages: Installed. No version info available.
typer: 0.12.5
typing-extensions: None
upstash-redis: Installed. No version info available.
uuid6: 2024.1.12
vdms: Installed. No version info available.
xata: Installed. No version info available.
xmltodict: 0.13.0

@dosubot dosubot bot added the 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature label Oct 21, 2024
@PraNavKumAr01 PraNavKumAr01 changed the title HuggingFaceEmbeddings take the default model name even while passing in client parameter. HuggingFaceEmbeddings takes the default model name and reinitializes self.client even while passing in client parameter. Oct 21, 2024
@vbarda
Copy link
Contributor

vbarda commented Oct 21, 2024

@PraNavKumAr01 i don't think client is meant to be passed -- the field is marked as private. are you able to pass all of the relevant information for your model via model_kwargs in HuggingFaceEmbeddings?

@vbarda
Copy link
Contributor

vbarda commented Oct 21, 2024

Removed client here #27495 -- going to close this issue, as users shouldn't be passing client directly. Feel free to open another issue if some functionality is missing!

@vbarda vbarda closed this as completed Oct 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature
Projects
None yet
Development

No branches or pull requests

2 participants