Self-hosted models on remote clusters, including on-demand from AWS, GCP, Azure, Lambda #978

dongreenberg · 2023-02-10T16:05:32Z

New modules to facilitate easy use of embedding and LLM models on one's own cloud GPUs. Uses Runhouse to facilitate cloud RPC. Supports AWS, GCP, Azure, and Lambda today (auto-launching) and BYO hardware by IP and SSH creds (e.g. for on-prem or other clouds like Coreweave, Paperspace, etc.).

APIs
The API mirrors the HuggingFaceEmbedding and HuggingFaceInstructEmbedding, but accepts an additional "hardware" parameter:

from langchain.embeddings import SelfHostedHuggingFaceEmbeddings, SelfHostedHuggingFaceInstructEmbeddings
import runhouse as rh
gpu = rh.cluster(name="rh-a10x", instance_type="A100:1")
hf = SelfHostedHuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2", hardware=gpu)

# Will run on the same GPU
hf_instruct = SelfHostedHuggingFaceInstructEmbeddings(model_name="hkunlp/instructor-large", hardware=gpu)

The rh.cluster above will launch the A100 on GCP, Azure, or Lambda, whichever is enabled and cheapest (thanks to SkyPilot). You can specify a specific provider by provider='gcp', as well as use_spot, region, image_id, and autostop_mins. For AWS you'd need to just switch to "A10G:1". For BYO cluster, you can do:

gpu = rh.cluster(ips=['<ip of the cluster>'], 
                             ssh_creds={'ssh_user': '...', 'ssh_private_key':'<path_to_key>'},
                             name='rh-a10x')

Design
All we're doing here is sending a pre-defined inference function to the cluster through Runhouse, which brings up the cluster if needed, installs the dependencies, and returns a callable that sends requests to run the function over gRPC. The function takes the model_id as an input, but the model is cached so only needs to be downloaded once. We can improve performance further pretty easily by pinning the model to GPU memory on the cluster. Let me know if that's of interest.

Testing
Added new tests embeddings/test_self_hosted.py (which mirror test_huggingface.py) and llms/test_self_hosted_llm.py. Tests all pass on Lambda Labs (which is surprising, because the first two test_huggingface.py tests are supposedly segfaulting?). We can pin the provider used in the test to whichever is used by your CI, or you can choose to only run these on a schedule to avoid spinning up a GPU (can take ~5 minutes including installations).

Introduce SelfHostedPipeline and SelfHostedHuggingFaceLLM
Introduce SelfHostedEmbedding, SelfHostedHuggingFaceEmbedding, and SelfHostedHuggingFaceInstructEmbedding
Add tutorials for Self-hosted LLMs and Embeddings
Implement chat-your-data tutorial with Self-hosted models - https://github.com/dongreenberg/chat-your-data

…structEmbeddings.

… test_self_hosted_hf_pipeline.py

hwchase17

this is awesome! i would add an example notebook so its more discoverable in the documentation (https://langchain.readthedocs.io/en/latest/modules/llms/integrations.html) and (https://langchain.readthedocs.io/en/latest/modules/utils/combine_docs_examples/embeddings.html)

…ts pass.

Co-authored-by: John Dagdelen <[email protected]> Co-authored-by: Harrison Chase <[email protected]>

Co-authored-by: Harrison Chase <[email protected]>

Co-authored-by: Andrew White <[email protected]> Co-authored-by: Harrison Chase <[email protected]> Co-authored-by: Peng Qu <[email protected]>

…ain-ai#979) ### Summary Adds a `UnstructuredURLLoader` that supports loading data from a list of URLs. ### Testing ```python from langchain.document_loaders import UnstructuredURLLoader urls = [ "https://www.understandingwar.org/backgrounder/russian-offensive-campaign-assessment-february-8-2023", "https://www.understandingwar.org/backgrounder/russian-offensive-campaign-assessment-february-9-2023" ] loader = UnstructuredURLLoader(urls=urls) raw_documents = loader.load() ```

Wonder why "with" is spelled "wiht" so many times by human

Co-authored-by: Harrison Chase <[email protected]>

The provided example uses the default `max_length` of `20` tokens, which leads to the example generation getting cut off. 20 tokens is way too short to show CoT reasoning, so I boosted it to `64`. Without knowing HF's API well, it can be hard to figure out just where those `model_kwargs` come from, and `max_length` is a super critical one.

Co-authored-by: zanderchase <[email protected]> Co-authored-by: Harrison Chase <[email protected]>

In [pyproject.toml](https://github.com/hwchase17/langchain/blob/master/pyproject.toml), the expectation is `SQLAlchemy = "^1"`. But, the way `declarative_base` is imported in [cache.py](https://github.com/hwchase17/langchain/blob/master/langchain/cache.py) will only work with SQLAlchemy >=1.4. This PR makes sure Langchain can be run in environments with SQLAlchemy <1.4

Co-authored-by: Harrison Chase <[email protected]>

Co-authored-by: Stefan Keselj <[email protected]> Co-authored-by: Harrison Chase <[email protected]>

…ngchain-ai#1000) Co-authored-by: Francisco Ingham <>

simple typo fix: because --> between

…1011) Updates the Unstructured example notebook with a PDF example. Includes additional dependencies for PDF processing (and images, etc).

Chroma is a simple to use, open-source, zero-config, zero setup vectorstore. Simply `pip install chromadb`, and you're good to go. Out-of-the-box Chroma is suitable for most LangChain workloads, but is highly flexible. I tested to 1M embs on my M1 mac, with out issues and reasonably fast query times. Look out for future releases as we integrate more Chroma features with LangChain!

Co-authored-by: William FH <[email protected]>

Alternate implementation to PR langchain-ai#960 Again - only FAISS is implemented. If accepted can add this to other vectorstores or leave as NotImplemented? Suggestions welcome...

…ze calculation. (langchain-ai#991) I modified the logic of the batch calculation for embedding according to this cookbook https://github.com/openai/openai-cookbook/blob/main/examples/Embedding_long_inputs.ipynb

This is a work in progress PR to track my progres. ## TODO: - [x] Get results using the specifed searx host - [x] Prioritize returning an `answer` or results otherwise - [ ] expose the field `infobox` when available - [ ] expose `score` of result to help agent's decision - [ ] expose the `suggestions` field to agents so they could try new queries if no results are found with the orignial query ? - [ ] Dynamic tool description for agents ? - Searx offers many engines and a search syntax that agents can take advantage of. It would be nice to generate a dynamic Tool description so that it can be used many times as a tool but for different purposes. - [x] Limit number of results - [ ] Implement paging - [x] Miror the usage of the Google Search tool - [x] easy selection of search engines - [x] Documentation - [ ] update HowTo guide notebook on Search Tools - [ ] Handle async - [ ] Tests ### Add examples / documentation on possible uses with - [ ] getting factual answers with `!wiki` option and `infoboxes` - [ ] getting `suggestions` - [ ] getting `corrections` --------- Co-authored-by: blob42 <spike@w530> Co-authored-by: Harrison Chase <[email protected]>

Co-authored-by: Ivan Vendrov <[email protected]> Co-authored-by: Sasmitha Manathunga <[email protected]>

Co-authored-by: Chen Wu (吴尘) <[email protected]>

This addresses langchain-ai#948. I set the documentation max width to 2560px, but can be adjusted - see screenshot below. <img width="1741" alt="Screenshot 2023-02-14 at 13 05 57" src="https://user-images.githubusercontent.com/23406704/218749076-ea51e90a-a220-4558-b4fe-5a95b39ebf15.png">

Co-authored-by: Maxime Vidal <[email protected]>

Co-authored-by: Francisco Ingham <[email protected]>

Fixes langchain-ai#1087

Changed number of types of chains to make it consistent with the rest of the docs

… test_self_hosted_hf_pipeline.py

…ts pass.

…tedHuggingFaceLLM classes. - Refactor self-hosted Embeddings into separate SelfHostedPipeline, SelfHostedHuggingFaceLLM classes. - Add self_hosted_examples.ipynb - Add embeddings examples to embeddings.ipynb - Add overview to runhosue.md Tests across all of the above pass.

# Conflicts: # docs/modules/llms/integrations.rst # docs/modules/utils/combine_docs_examples/embeddings.ipynb # langchain/document_loaders/telegram.py # langchain/embeddings/openai.py # langchain/llms/__init__.py # langchain/vectorstores/qdrant.py

dongreenberg · 2023-02-17T12:16:03Z

Something strange happened when I merged in the upstream, but we're all good. Tests all pass and added those tutorials!

dongreenberg · 2023-02-19T04:19:16Z

Oh, didn't realize I need to update the poetry lock, doing that now.

…angeness with awscli and boto.

Introduce SelfHostedHuggingFaceEmbeddings and SelfHostedHuggingFaceIn…

ec65d63

…structEmbeddings.

dongreenberg changed the title ~~Introduce Self Hosted models~~ Self Hosted models Feb 10, 2023

dongreenberg changed the title ~~Self Hosted models~~ Self-hosted models Feb 10, 2023

dongreenberg changed the title ~~Self-hosted models~~ Self-hosted models on AWS, GCP, Azure, Lambda, or BYO IP Feb 10, 2023

Update inits and start strawmen for SelfHostedHuggingFacePipeline and…

0393ce7

… test_self_hosted_hf_pipeline.py

hwchase17 reviewed Feb 10, 2023

View reviewed changes

dongreenberg added 2 commits February 12, 2023 05:38

Add SelfHostedHFPipeline LLM and test_self_hosted_hf_pipeline.py, tes…

9df7dda

…ts pass.

Add SelfHostedHFPipeline LLM and test_self_hosted_hf_pipeline.py, tes…

b3c68b1

…ts pass.

dongreenberg changed the title ~~Self-hosted models on AWS, GCP, Azure, Lambda, or BYO IP~~ Self-hosted models on remote clusters, including on-demand from AWS, GCP, Azure, Lambda Feb 16, 2023

hwchase17 and others added 21 commits February 17, 2023 07:00

Harrison/batch embeds (langchain-ai#972)

78df169

Co-authored-by: John Dagdelen <[email protected]> Co-authored-by: Harrison Chase <[email protected]>

Harrison/everynote (langchain-ai#974)

491840c

Co-authored-by: Harrison Chase <[email protected]>

Harrison/format agent instructions (langchain-ai#973)

68a62cd

Co-authored-by: Andrew White <[email protected]> Co-authored-by: Harrison Chase <[email protected]> Co-authored-by: Peng Qu <[email protected]>

docs: update spelling typos (langchain-ai#982)

f4de02a

Wonder why "with" is spelled "wiht" so many times by human

bump version to 0082 (langchain-ai#980)

8461be5

Co-authored-by: Harrison Chase <[email protected]>

add texts example (langchain-ai#985)

bf41b35

Co-authored-by: Harrison Chase <[email protected]>

Zander/online pdf loader (langchain-ai#984)

cdcc28c

Harrison/airbyte (langchain-ai#989)

cbfa931

Co-authored-by: zanderchase <[email protected]> Co-authored-by: Harrison Chase <[email protected]>

Harrison/0083 (langchain-ai#996)

0ed8409

Co-authored-by: Harrison Chase <[email protected]>

Harrison/fake llm (langchain-ai#990)

320820f

Co-authored-by: Stefan Keselj <[email protected]> Co-authored-by: Harrison Chase <[email protected]>

Added initial capital letter to bullet points that had it missing (la…

dc5c4ff

…ngchain-ai#1000) Co-authored-by: Francisco Ingham <>

pdfminer (langchain-ai#1003)

0cded64

Harrison/unstructured structured (langchain-ai#1004)

a71eed1

bump version to 0084 (langchain-ai#1005)

100e779

typo fix on chat vector db docs (langchain-ai#1007)

0ab2f43

simple typo fix: because --> between

Unstructured example notebook: add a pdf, related deps (langchain-ai#…

052f494

…1011) Updates the Unstructured example notebook with a PDF example. Includes additional dependencies for PDF processing (and images, etc).

Harrion/kg (langchain-ai#1016)

d2ef837

Co-authored-by: William FH <[email protected]>

seanaedmiston and others added 19 commits February 17, 2023 07:01

Support similarity search by vector (in FAISS) (langchain-ai#961)

e58867a

Alternate implementation to PR langchain-ai#960 Again - only FAISS is implemented. If accepted can add this to other vectorstores or leave as NotImplemented? Suggestions welcome...

add anthropic example (langchain-ai#1041)

78127f9

Co-authored-by: Ivan Vendrov <[email protected]> Co-authored-by: Sasmitha Manathunga <[email protected]>

Harrison/semantic subset (langchain-ai#1079)

1aa5295

Co-authored-by: Chen Wu (吴尘) <[email protected]>

Harrison/telegram loader (langchain-ai#1080)

9860181

Co-authored-by: Maxime Vidal <[email protected]>

Harrison/align table (langchain-ai#1081)

45c2eae

Co-authored-by: Francisco Ingham <[email protected]>

docs for batch size (langchain-ai#1082)

f56e672

fix stuff count (langchain-ai#1083)

70b825b

chat qa with sources (langchain-ai#1084)

a2d3e8e

Update qdrant.py (langchain-ai#1088)

b1b0d00

Fixes langchain-ai#1087

Modify number of types of chains (langchain-ai#1089)

8f2e601

Changed number of types of chains to make it consistent with the rest of the docs

bump version 0.0.88 (langchain-ai#1090)

71c1407

Update inits and start strawmen for SelfHostedHuggingFacePipeline and…

e90fc1e

… test_self_hosted_hf_pipeline.py

Add SelfHostedHFPipeline LLM and test_self_hosted_hf_pipeline.py, tes…

6dad0b7

…ts pass.

Add SelfHostedHFPipeline LLM and test_self_hosted_hf_pipeline.py, tes…

bada927

…ts pass.

dongreenberg requested a review from hwchase17 February 17, 2023 12:16

dongreenberg added 5 commits February 17, 2023 07:45

Fix lint

a200193

Add optional dependency

f9244c4

Fix docstring

70387c4

Fix docstrings

34fd73e

Fix docstrings

9ca18a5

Remove optional runhouse from pyproject for now because of poetry str…

a9a4357

…angeness with awscli and boto.

hwchase17 changed the base branch from master to harrison/self-hosted-runhouse February 19, 2023 16:42

hwchase17 merged commit 8272fc1 into langchain-ai:harrison/self-hosted-runhouse Feb 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Self-hosted models on remote clusters, including on-demand from AWS, GCP, Azure, Lambda #978

Self-hosted models on remote clusters, including on-demand from AWS, GCP, Azure, Lambda #978

dongreenberg commented Feb 10, 2023 •

edited

Loading

hwchase17 left a comment

dongreenberg commented Feb 17, 2023

dongreenberg commented Feb 19, 2023

Self-hosted models on remote clusters, including on-demand from AWS, GCP, Azure, Lambda #978

Self-hosted models on remote clusters, including on-demand from AWS, GCP, Azure, Lambda #978

Conversation

dongreenberg commented Feb 10, 2023 • edited Loading

hwchase17 left a comment

Choose a reason for hiding this comment

dongreenberg commented Feb 17, 2023

dongreenberg commented Feb 19, 2023

dongreenberg commented Feb 10, 2023 •

edited

Loading