-
Notifications
You must be signed in to change notification settings - Fork 16.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Harrison/self hosted runhouse #1154
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…GCP, Azure, Lambda (#978) New modules to facilitate easy use of embedding and LLM models on one's own cloud GPUs. Uses [Runhouse](https://github.com/run-house/runhouse) to facilitate cloud RPC. Supports AWS, GCP, Azure, and Lambda today (auto-launching) and BYO hardware by IP and SSH creds (e.g. for on-prem or other clouds like Coreweave, Paperspace, etc.). **APIs** The API mirrors the HuggingFaceEmbedding and HuggingFaceInstructEmbedding, but accepts an additional "hardware" parameter: ``` from langchain.embeddings import SelfHostedHuggingFaceEmbeddings, SelfHostedHuggingFaceInstructEmbeddings import runhouse as rh gpu = rh.cluster(name="rh-a10x", instance_type="A100:1") hf = SelfHostedHuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2", hardware=gpu) # Will run on the same GPU hf_instruct = SelfHostedHuggingFaceInstructEmbeddings(model_name="hkunlp/instructor-large", hardware=gpu) ``` The rh.cluster above will launch the A100 on GCP, Azure, or Lambda, whichever is enabled and cheapest (thanks to SkyPilot). You can specify a specific provider by `provider='gcp'`, as well as `use_spot`, `region`, `image_id`, and `autostop_mins`. For AWS you'd need to just switch to "A10G:1". For BYO cluster, you can do: ``` gpu = rh.cluster(ips=['<ip of the cluster>'], ssh_creds={'ssh_user': '...', 'ssh_private_key':'<path_to_key>'}, name='rh-a10x') ``` **Design** All we're doing here is sending a pre-defined inference function to the cluster through Runhouse, which brings up the cluster if needed, installs the dependencies, and returns a callable that sends requests to run the function over gRPC. The function takes the model_id as an input, but the model is cached so only needs to be downloaded once. We can improve performance further pretty easily by pinning the model to GPU memory on the cluster. Let me know if that's of interest. **Testing** Added new tests embeddings/test_self_hosted.py (which mirror test_huggingface.py) and llms/test_self_hosted_llm.py. Tests all pass on Lambda Labs (which is surprising, because the first two test_huggingface.py tests are supposedly segfaulting?). We can pin the provider used in the test to whichever is used by your CI, or you can choose to only run these on a schedule to avoid spinning up a GPU (can take ~5 minutes including installations). - [x] Introduce SelfHostedPipeline and SelfHostedHuggingFaceLLM - [x] Introduce SelfHostedEmbedding, SelfHostedHuggingFaceEmbedding, and SelfHostedHuggingFaceInstructEmbedding - [x] Add tutorials for Self-hosted LLMs and Embeddings - [x] Implement chat-your-data tutorial with Self-hosted models - https://github.com/dongreenberg/chat-your-data --------- Co-authored-by: Harrison Chase <[email protected]> Co-authored-by: John Dagdelen <[email protected]> Co-authored-by: Harrison Chase <[email protected]> Co-authored-by: Andrew White <[email protected]> Co-authored-by: Peng Qu <[email protected]> Co-authored-by: Matt Robinson <[email protected]> Co-authored-by: jeff <[email protected]> Co-authored-by: Harrison Chase <[email protected]> Co-authored-by: zanderchase <[email protected]> Co-authored-by: Charles Frye <[email protected]> Co-authored-by: zanderchase <[email protected]> Co-authored-by: Shahriar Tajbakhsh <[email protected]> Co-authored-by: Stefan Keselj <[email protected]> Co-authored-by: Francisco Ingham <[email protected]> Co-authored-by: Dhruv Anand <[email protected]> Co-authored-by: cragwolfe <[email protected]> Co-authored-by: Anton Troynikov <[email protected]> Co-authored-by: William FH <[email protected]> Co-authored-by: Oliver Klingefjord <[email protected]> Co-authored-by: blob42 <[email protected]> Co-authored-by: blob42 <spike@w530> Co-authored-by: Enrico Shippole <[email protected]> Co-authored-by: Ibis Prevedello <[email protected]> Co-authored-by: jped <[email protected]> Co-authored-by: Justin Torre <[email protected]> Co-authored-by: Ivan Vendrov <[email protected]> Co-authored-by: Sasmitha Manathunga <[email protected]> Co-authored-by: Ankush Gola <[email protected]> Co-authored-by: Matt Robinson <[email protected]> Co-authored-by: Jeff Huber <[email protected]> Co-authored-by: Akshay <[email protected]> Co-authored-by: Andrew Huang <[email protected]> Co-authored-by: rogerserper <[email protected]> Co-authored-by: seanaedmiston <[email protected]> Co-authored-by: Hasegawa Yuya <[email protected]> Co-authored-by: Ivan Vendrov <[email protected]> Co-authored-by: Chen Wu (吴尘) <[email protected]> Co-authored-by: Dennis Antela Martinez <[email protected]> Co-authored-by: Maxime Vidal <[email protected]> Co-authored-by: Rishabh Raizada <[email protected]>
Closed
Can I use runhouse with a local GPU? I am asking because the langchin uses runhouse as backed for the self-hosted model. |
zachschillaci27
pushed a commit
to zachschillaci27/langchain
that referenced
this pull request
Mar 8, 2023
Co-authored-by: Donny Greenberg <[email protected]> Co-authored-by: John Dagdelen <[email protected]> Co-authored-by: Harrison Chase <[email protected]> Co-authored-by: Andrew White <[email protected]> Co-authored-by: Peng Qu <[email protected]> Co-authored-by: Matt Robinson <[email protected]> Co-authored-by: jeff <[email protected]> Co-authored-by: Harrison Chase <[email protected]> Co-authored-by: zanderchase <[email protected]> Co-authored-by: Charles Frye <[email protected]> Co-authored-by: zanderchase <[email protected]> Co-authored-by: Shahriar Tajbakhsh <[email protected]> Co-authored-by: Stefan Keselj <[email protected]> Co-authored-by: Francisco Ingham <[email protected]> Co-authored-by: Dhruv Anand <[email protected]> Co-authored-by: cragwolfe <[email protected]> Co-authored-by: Anton Troynikov <[email protected]> Co-authored-by: William FH <[email protected]> Co-authored-by: Oliver Klingefjord <[email protected]> Co-authored-by: blob42 <[email protected]> Co-authored-by: blob42 <spike@w530> Co-authored-by: Enrico Shippole <[email protected]> Co-authored-by: Ibis Prevedello <[email protected]> Co-authored-by: jped <[email protected]> Co-authored-by: Justin Torre <[email protected]> Co-authored-by: Ivan Vendrov <[email protected]> Co-authored-by: Sasmitha Manathunga <[email protected]> Co-authored-by: Ankush Gola <[email protected]> Co-authored-by: Matt Robinson <[email protected]> Co-authored-by: Jeff Huber <[email protected]> Co-authored-by: Akshay <[email protected]> Co-authored-by: Andrew Huang <[email protected]> Co-authored-by: rogerserper <[email protected]> Co-authored-by: seanaedmiston <[email protected]> Co-authored-by: Hasegawa Yuya <[email protected]> Co-authored-by: Ivan Vendrov <[email protected]> Co-authored-by: Chen Wu (吴尘) <[email protected]> Co-authored-by: Dennis Antela Martinez <[email protected]> Co-authored-by: Maxime Vidal <[email protected]> Co-authored-by: Rishabh Raizada <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.