-
Notifications
You must be signed in to change notification settings - Fork 16.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Self-hosted models on remote clusters, including on-demand from AWS, GCP, Azure, Lambda #978
Self-hosted models on remote clusters, including on-demand from AWS, GCP, Azure, Lambda #978
Conversation
…structEmbeddings.
… test_self_hosted_hf_pipeline.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is awesome! i would add an example notebook so its more discoverable in the documentation (https://langchain.readthedocs.io/en/latest/modules/llms/integrations.html) and (https://langchain.readthedocs.io/en/latest/modules/utils/combine_docs_examples/embeddings.html)
Co-authored-by: John Dagdelen <[email protected]> Co-authored-by: Harrison Chase <[email protected]>
Co-authored-by: Harrison Chase <[email protected]>
Co-authored-by: Andrew White <[email protected]> Co-authored-by: Harrison Chase <[email protected]> Co-authored-by: Peng Qu <[email protected]>
…ain-ai#979) ### Summary Adds a `UnstructuredURLLoader` that supports loading data from a list of URLs. ### Testing ```python from langchain.document_loaders import UnstructuredURLLoader urls = [ "https://www.understandingwar.org/backgrounder/russian-offensive-campaign-assessment-february-8-2023", "https://www.understandingwar.org/backgrounder/russian-offensive-campaign-assessment-february-9-2023" ] loader = UnstructuredURLLoader(urls=urls) raw_documents = loader.load() ```
Wonder why "with" is spelled "wiht" so many times by human
Co-authored-by: Harrison Chase <[email protected]>
Co-authored-by: Harrison Chase <[email protected]>
The provided example uses the default `max_length` of `20` tokens, which leads to the example generation getting cut off. 20 tokens is way too short to show CoT reasoning, so I boosted it to `64`. Without knowing HF's API well, it can be hard to figure out just where those `model_kwargs` come from, and `max_length` is a super critical one.
Co-authored-by: zanderchase <[email protected]> Co-authored-by: Harrison Chase <[email protected]>
In [pyproject.toml](https://github.com/hwchase17/langchain/blob/master/pyproject.toml), the expectation is `SQLAlchemy = "^1"`. But, the way `declarative_base` is imported in [cache.py](https://github.com/hwchase17/langchain/blob/master/langchain/cache.py) will only work with SQLAlchemy >=1.4. This PR makes sure Langchain can be run in environments with SQLAlchemy <1.4
Co-authored-by: Harrison Chase <[email protected]>
Co-authored-by: Stefan Keselj <[email protected]> Co-authored-by: Harrison Chase <[email protected]>
…ngchain-ai#1000) Co-authored-by: Francisco Ingham <>
simple typo fix: because --> between
…1011) Updates the Unstructured example notebook with a PDF example. Includes additional dependencies for PDF processing (and images, etc).
Chroma is a simple to use, open-source, zero-config, zero setup vectorstore. Simply `pip install chromadb`, and you're good to go. Out-of-the-box Chroma is suitable for most LangChain workloads, but is highly flexible. I tested to 1M embs on my M1 mac, with out issues and reasonably fast query times. Look out for future releases as we integrate more Chroma features with LangChain!
Co-authored-by: William FH <[email protected]>
Alternate implementation to PR langchain-ai#960 Again - only FAISS is implemented. If accepted can add this to other vectorstores or leave as NotImplemented? Suggestions welcome...
…ze calculation. (langchain-ai#991) I modified the logic of the batch calculation for embedding according to this cookbook https://github.com/openai/openai-cookbook/blob/main/examples/Embedding_long_inputs.ipynb
This is a work in progress PR to track my progres. ## TODO: - [x] Get results using the specifed searx host - [x] Prioritize returning an `answer` or results otherwise - [ ] expose the field `infobox` when available - [ ] expose `score` of result to help agent's decision - [ ] expose the `suggestions` field to agents so they could try new queries if no results are found with the orignial query ? - [ ] Dynamic tool description for agents ? - Searx offers many engines and a search syntax that agents can take advantage of. It would be nice to generate a dynamic Tool description so that it can be used many times as a tool but for different purposes. - [x] Limit number of results - [ ] Implement paging - [x] Miror the usage of the Google Search tool - [x] easy selection of search engines - [x] Documentation - [ ] update HowTo guide notebook on Search Tools - [ ] Handle async - [ ] Tests ### Add examples / documentation on possible uses with - [ ] getting factual answers with `!wiki` option and `infoboxes` - [ ] getting `suggestions` - [ ] getting `corrections` --------- Co-authored-by: blob42 <spike@w530> Co-authored-by: Harrison Chase <[email protected]>
Co-authored-by: Ivan Vendrov <[email protected]> Co-authored-by: Sasmitha Manathunga <[email protected]>
Co-authored-by: Chen Wu (吴尘) <[email protected]>
This addresses langchain-ai#948. I set the documentation max width to 2560px, but can be adjusted - see screenshot below. <img width="1741" alt="Screenshot 2023-02-14 at 13 05 57" src="https://user-images.githubusercontent.com/23406704/218749076-ea51e90a-a220-4558-b4fe-5a95b39ebf15.png">
Co-authored-by: Maxime Vidal <[email protected]>
Co-authored-by: Francisco Ingham <[email protected]>
Changed number of types of chains to make it consistent with the rest of the docs
… test_self_hosted_hf_pipeline.py
…tedHuggingFaceLLM classes. - Refactor self-hosted Embeddings into separate SelfHostedPipeline, SelfHostedHuggingFaceLLM classes. - Add self_hosted_examples.ipynb - Add embeddings examples to embeddings.ipynb - Add overview to runhosue.md Tests across all of the above pass.
# Conflicts: # docs/modules/llms/integrations.rst # docs/modules/utils/combine_docs_examples/embeddings.ipynb # langchain/document_loaders/telegram.py # langchain/embeddings/openai.py # langchain/llms/__init__.py # langchain/vectorstores/qdrant.py
Something strange happened when I merged in the upstream, but we're all good. Tests all pass and added those tutorials! |
Oh, didn't realize I need to update the poetry lock, doing that now. |
…angeness with awscli and boto.
New modules to facilitate easy use of embedding and LLM models on one's own cloud GPUs. Uses Runhouse to facilitate cloud RPC. Supports AWS, GCP, Azure, and Lambda today (auto-launching) and BYO hardware by IP and SSH creds (e.g. for on-prem or other clouds like Coreweave, Paperspace, etc.).
APIs
The API mirrors the HuggingFaceEmbedding and HuggingFaceInstructEmbedding, but accepts an additional "hardware" parameter:
The rh.cluster above will launch the A100 on GCP, Azure, or Lambda, whichever is enabled and cheapest (thanks to SkyPilot). You can specify a specific provider by
provider='gcp'
, as well asuse_spot
,region
,image_id
, andautostop_mins
. For AWS you'd need to just switch to "A10G:1". For BYO cluster, you can do:Design
All we're doing here is sending a pre-defined inference function to the cluster through Runhouse, which brings up the cluster if needed, installs the dependencies, and returns a callable that sends requests to run the function over gRPC. The function takes the model_id as an input, but the model is cached so only needs to be downloaded once. We can improve performance further pretty easily by pinning the model to GPU memory on the cluster. Let me know if that's of interest.
Testing
Added new tests embeddings/test_self_hosted.py (which mirror test_huggingface.py) and llms/test_self_hosted_llm.py. Tests all pass on Lambda Labs (which is surprising, because the first two test_huggingface.py tests are supposedly segfaulting?). We can pin the provider used in the test to whichever is used by your CI, or you can choose to only run these on a schedule to avoid spinning up a GPU (can take ~5 minutes including installations).