Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Self-hosted models on remote clusters, including on-demand from AWS, GCP, Azure, Lambda #978

Merged

Conversation

dongreenberg
Copy link
Contributor

@dongreenberg dongreenberg commented Feb 10, 2023

New modules to facilitate easy use of embedding and LLM models on one's own cloud GPUs. Uses Runhouse to facilitate cloud RPC. Supports AWS, GCP, Azure, and Lambda today (auto-launching) and BYO hardware by IP and SSH creds (e.g. for on-prem or other clouds like Coreweave, Paperspace, etc.).

APIs
The API mirrors the HuggingFaceEmbedding and HuggingFaceInstructEmbedding, but accepts an additional "hardware" parameter:

from langchain.embeddings import SelfHostedHuggingFaceEmbeddings, SelfHostedHuggingFaceInstructEmbeddings
import runhouse as rh
gpu = rh.cluster(name="rh-a10x", instance_type="A100:1")
hf = SelfHostedHuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2", hardware=gpu)

# Will run on the same GPU
hf_instruct = SelfHostedHuggingFaceInstructEmbeddings(model_name="hkunlp/instructor-large", hardware=gpu)

The rh.cluster above will launch the A100 on GCP, Azure, or Lambda, whichever is enabled and cheapest (thanks to SkyPilot). You can specify a specific provider by provider='gcp', as well as use_spot, region, image_id, and autostop_mins. For AWS you'd need to just switch to "A10G:1". For BYO cluster, you can do:

gpu = rh.cluster(ips=['<ip of the cluster>'], 
                             ssh_creds={'ssh_user': '...', 'ssh_private_key':'<path_to_key>'},
                             name='rh-a10x')

Design
All we're doing here is sending a pre-defined inference function to the cluster through Runhouse, which brings up the cluster if needed, installs the dependencies, and returns a callable that sends requests to run the function over gRPC. The function takes the model_id as an input, but the model is cached so only needs to be downloaded once. We can improve performance further pretty easily by pinning the model to GPU memory on the cluster. Let me know if that's of interest.

Testing
Added new tests embeddings/test_self_hosted.py (which mirror test_huggingface.py) and llms/test_self_hosted_llm.py. Tests all pass on Lambda Labs (which is surprising, because the first two test_huggingface.py tests are supposedly segfaulting?). We can pin the provider used in the test to whichever is used by your CI, or you can choose to only run these on a schedule to avoid spinning up a GPU (can take ~5 minutes including installations).

  • Introduce SelfHostedPipeline and SelfHostedHuggingFaceLLM
  • Introduce SelfHostedEmbedding, SelfHostedHuggingFaceEmbedding, and SelfHostedHuggingFaceInstructEmbedding
  • Add tutorials for Self-hosted LLMs and Embeddings
  • Implement chat-your-data tutorial with Self-hosted models - https://github.com/dongreenberg/chat-your-data

@dongreenberg dongreenberg changed the title Introduce Self Hosted models Self Hosted models Feb 10, 2023
@dongreenberg dongreenberg changed the title Self Hosted models Self-hosted models Feb 10, 2023
@dongreenberg dongreenberg changed the title Self-hosted models Self-hosted models on AWS, GCP, Azure, Lambda, or BYO IP Feb 10, 2023
Copy link
Contributor

@hwchase17 hwchase17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongreenberg dongreenberg changed the title Self-hosted models on AWS, GCP, Azure, Lambda, or BYO IP Self-hosted models on remote clusters, including on-demand from AWS, GCP, Azure, Lambda Feb 16, 2023
hwchase17 and others added 21 commits February 17, 2023 07:00
Co-authored-by: John Dagdelen <[email protected]>
Co-authored-by: Harrison Chase <[email protected]>
Co-authored-by: Andrew White <[email protected]>
Co-authored-by: Harrison Chase <[email protected]>
Co-authored-by: Peng Qu <[email protected]>
…ain-ai#979)

### Summary

Adds a `UnstructuredURLLoader` that supports loading data from a list of
URLs.


### Testing

```python
from langchain.document_loaders import UnstructuredURLLoader

urls = [
    "https://www.understandingwar.org/backgrounder/russian-offensive-campaign-assessment-february-8-2023",
    "https://www.understandingwar.org/backgrounder/russian-offensive-campaign-assessment-february-9-2023"
]
loader = UnstructuredURLLoader(urls=urls)
raw_documents = loader.load()
```
Wonder why "with" is spelled "wiht" so many times by human
The provided example uses the default `max_length` of `20` tokens, which
leads to the example generation getting cut off. 20 tokens is way too
short to show CoT reasoning, so I boosted it to `64`.

Without knowing HF's API well, it can be hard to figure out just where
those `model_kwargs` come from, and `max_length` is a super critical
one.
Co-authored-by: zanderchase <[email protected]>
Co-authored-by: Harrison Chase <[email protected]>
In
[pyproject.toml](https://github.com/hwchase17/langchain/blob/master/pyproject.toml),
the expectation is `SQLAlchemy = "^1"`. But, the way `declarative_base`
is imported in
[cache.py](https://github.com/hwchase17/langchain/blob/master/langchain/cache.py)
will only work with SQLAlchemy >=1.4. This PR makes sure Langchain can
be run in environments with SQLAlchemy <1.4
Co-authored-by: Stefan Keselj <[email protected]>
Co-authored-by: Harrison Chase <[email protected]>
simple typo fix: because --> between
…1011)

Updates the Unstructured example notebook with a PDF example. Includes
additional dependencies for PDF processing (and images, etc).
Chroma is a simple to use, open-source, zero-config, zero setup
vectorstore.

Simply `pip install chromadb`, and you're good to go. 

Out-of-the-box Chroma is suitable for most LangChain workloads, but is
highly flexible. I tested to 1M embs on my M1 mac, with out issues and
reasonably fast query times.

Look out for future releases as we integrate more Chroma features with
LangChain!
seanaedmiston and others added 19 commits February 17, 2023 07:01
Alternate implementation to PR langchain-ai#960 Again - only FAISS is implemented.
If accepted can add this to other vectorstores or leave as
NotImplemented? Suggestions welcome...
This is a work in progress PR to track my progres.

## TODO:

- [x]  Get results using the specifed searx host
- [x]  Prioritize returning an  `answer`  or results otherwise
    - [ ] expose the field `infobox` when available
    - [ ] expose `score` of result to help agent's decision
- [ ] expose the `suggestions` field to agents so they could try new
queries if no results are found with the orignial query ?

- [ ] Dynamic tool description for agents ?
- Searx offers many engines and a search syntax that agents can take
advantage of. It would be nice to generate a dynamic Tool description so
that it can be used many times as a tool but for different purposes.

- [x]  Limit number of results
- [ ]   Implement paging
- [x]  Miror the usage of the Google Search tool
- [x] easy selection of search engines
- [x]  Documentation
    - [ ] update HowTo guide notebook on Search Tools
- [ ] Handle async 
- [ ]  Tests

###  Add examples / documentation on possible uses with
 - [ ]  getting factual answers with `!wiki` option and `infoboxes`
 - [ ]  getting `suggestions`
 - [ ]  getting `corrections`

---------

Co-authored-by: blob42 <spike@w530>
Co-authored-by: Harrison Chase <[email protected]>
Co-authored-by: Ivan Vendrov <[email protected]>
Co-authored-by: Sasmitha Manathunga <[email protected]>
This addresses langchain-ai#948.

I set the documentation max width to 2560px, but can be adjusted - see
screenshot below.

<img width="1741" alt="Screenshot 2023-02-14 at 13 05 57"
src="https://user-images.githubusercontent.com/23406704/218749076-ea51e90a-a220-4558-b4fe-5a95b39ebf15.png">
Changed number of types of chains to make it consistent with the rest of
the docs
…tedHuggingFaceLLM classes.

- Refactor self-hosted Embeddings into separate SelfHostedPipeline, SelfHostedHuggingFaceLLM classes.
- Add self_hosted_examples.ipynb
- Add embeddings examples to embeddings.ipynb
- Add overview to runhosue.md
Tests across all of the above pass.
# Conflicts:
#	docs/modules/llms/integrations.rst
#	docs/modules/utils/combine_docs_examples/embeddings.ipynb
#	langchain/document_loaders/telegram.py
#	langchain/embeddings/openai.py
#	langchain/llms/__init__.py
#	langchain/vectorstores/qdrant.py
@dongreenberg
Copy link
Contributor Author

Something strange happened when I merged in the upstream, but we're all good. Tests all pass and added those tutorials!

@dongreenberg
Copy link
Contributor Author

Oh, didn't realize I need to update the poetry lock, doing that now.

@hwchase17 hwchase17 changed the base branch from master to harrison/self-hosted-runhouse February 19, 2023 16:42
@hwchase17 hwchase17 merged commit 8272fc1 into langchain-ai:harrison/self-hosted-runhouse Feb 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.