-
Notifications
You must be signed in to change notification settings - Fork 797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add VoyageAI embeddings #3069
Merged
Merged
Changes from 4 commits
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
06b69b1
Voyage AI embeddings
fzowl 423ff1e
Corrected the test
fzowl 9e0a4cf
tou 2 you
fzowl 7f91dca
Merge pull request #1 from voyage-ai/voyageai
Liuhong99 5a2e12f
requirements for embed-voyageai.txt regenerated with python 3.9
fzowl 1769a01
Merge pull request #2 from voyage-ai/voyageai
fzowl f4f5a27
Remove embedding.rst changes
fzowl ce4b456
Removing embedding.rst
fzowl 5fc189c
Merge pull request #3 from voyage-ai/voyageai
fzowl 964f65c
Merge branch 'main' into main
fzowl 4de00aa
Removing embedding.rst
fzowl b7e2aae
Merge branch 'main' into main
fzowl 0175b2d
Merge branch 'main' into main
fzowl 1dd08fe
Last(?) correction
fzowl 7e0ef46
Merge branch 'main' into main
fzowl 32f1afa
Merge branch 'main' into main
fzowl File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
import os | ||
|
||
from unstructured.documents.elements import Text | ||
from unstructured.embed.voyageai import VoyageAIEmbeddingConfig, VoyageAIEmbeddingEncoder | ||
|
||
# To use Voyage AI you will need to pass Voyage AI API Key (obtained from https://dash.voyageai.com/) | ||
# as the ``api_key`` parameter. | ||
# | ||
# The ``model_name`` parameter is mandatory, please check the available models | ||
# at https://docs.voyageai.com/docs/embeddings | ||
|
||
embedding_encoder = VoyageAIEmbeddingEncoder( | ||
config=VoyageAIEmbeddingConfig( | ||
api_key=os.environ["VOYAGE_API_KEY"], | ||
model_name="voyage-law-2" | ||
) | ||
) | ||
elements = embedding_encoder.embed_documents( | ||
elements=[Text("This is sentence 1"), Text("This is sentence 2")], | ||
) | ||
|
||
query = "This is the query" | ||
query_embedding = embedding_encoder.embed_query(query=query) | ||
|
||
[print(e, e.embeddings) for e in elements] | ||
print(query, query_embedding) | ||
print(embedding_encoder.is_unit_vector, embedding_encoder.num_of_dimensions) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
-c ../deps/constraints.txt | ||
-c ../base.txt | ||
langchain | ||
langchain-voyageai |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,140 @@ | ||
# | ||
# This file is autogenerated by pip-compile with Python 3.11 | ||
fzowl marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# by the following command: | ||
# | ||
# pip-compile ./ingest/embed-voyageai.in | ||
# | ||
aiohttp==3.9.5 | ||
# via | ||
# langchain | ||
# langchain-community | ||
# voyageai | ||
aiolimiter==1.1.0 | ||
# via voyageai | ||
aiosignal==1.3.1 | ||
# via aiohttp | ||
annotated-types==0.6.0 | ||
# via pydantic | ||
attrs==23.2.0 | ||
# via aiohttp | ||
certifi==2024.2.2 | ||
# via | ||
# -c ./ingest/../base.txt | ||
# -c ./ingest/../deps/constraints.txt | ||
# requests | ||
charset-normalizer==3.3.2 | ||
# via | ||
# -c ./ingest/../base.txt | ||
# requests | ||
dataclasses-json==0.6.6 | ||
# via | ||
# -c ./ingest/../base.txt | ||
# langchain | ||
# langchain-community | ||
frozenlist==1.4.1 | ||
# via | ||
# aiohttp | ||
# aiosignal | ||
idna==3.7 | ||
# via | ||
# -c ./ingest/../base.txt | ||
# requests | ||
# yarl | ||
jsonpatch==1.33 | ||
# via langchain-core | ||
jsonpointer==2.4 | ||
# via jsonpatch | ||
langchain==0.1.20 | ||
# via -r ./ingest/embed-voyageai.in | ||
langchain-community==0.0.38 | ||
# via langchain | ||
langchain-core==0.1.52 | ||
# via | ||
# langchain | ||
# langchain-community | ||
# langchain-text-splitters | ||
# langchain-voyageai | ||
langchain-text-splitters==0.0.1 | ||
# via langchain | ||
langchain-voyageai==0.1.1 | ||
# via -r ./ingest/embed-voyageai.in | ||
langsmith==0.1.57 | ||
# via | ||
# langchain | ||
# langchain-community | ||
# langchain-core | ||
marshmallow==3.21.2 | ||
# via | ||
# -c ./ingest/../base.txt | ||
# dataclasses-json | ||
multidict==6.0.5 | ||
# via | ||
# aiohttp | ||
# yarl | ||
mypy-extensions==1.0.0 | ||
# via | ||
# -c ./ingest/../base.txt | ||
# typing-inspect | ||
numpy==1.26.4 | ||
# via | ||
# -c ./ingest/../base.txt | ||
# langchain | ||
# langchain-community | ||
# voyageai | ||
orjson==3.10.3 | ||
# via langsmith | ||
packaging==23.2 | ||
# via | ||
# -c ./ingest/../base.txt | ||
# -c ./ingest/../deps/constraints.txt | ||
# langchain-core | ||
# marshmallow | ||
pydantic==2.7.1 | ||
# via | ||
# langchain | ||
# langchain-core | ||
# langsmith | ||
pydantic-core==2.18.2 | ||
# via pydantic | ||
pyyaml==6.0.1 | ||
# via | ||
# langchain | ||
# langchain-community | ||
# langchain-core | ||
requests==2.31.0 | ||
# via | ||
# -c ./ingest/../base.txt | ||
# langchain | ||
# langchain-community | ||
# langsmith | ||
# voyageai | ||
sqlalchemy==2.0.30 | ||
# via | ||
# langchain | ||
# langchain-community | ||
tenacity==8.3.0 | ||
# via | ||
# langchain | ||
# langchain-community | ||
# langchain-core | ||
# voyageai | ||
typing-extensions==4.11.0 | ||
# via | ||
# -c ./ingest/../base.txt | ||
# pydantic | ||
# pydantic-core | ||
# sqlalchemy | ||
# typing-inspect | ||
typing-inspect==0.9.0 | ||
# via | ||
# -c ./ingest/../base.txt | ||
# dataclasses-json | ||
urllib3==1.26.18 | ||
# via | ||
# -c ./ingest/../base.txt | ||
# -c ./ingest/../deps/constraints.txt | ||
# requests | ||
voyageai==0.2.2 | ||
# via langchain-voyageai | ||
yarl==1.9.4 | ||
# via aiohttp |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
from unstructured.documents.elements import Text | ||
from unstructured.embed.voyageai import VoyageAIEmbeddingConfig, VoyageAIEmbeddingEncoder | ||
|
||
|
||
def test_embed_documents_does_not_break_element_to_dict(mocker): | ||
# Mocked client with the desired behavior for embed_documents | ||
mock_client = mocker.MagicMock() | ||
mock_client.embed_documents.return_value = [1, 2] | ||
|
||
# Mock create_client to return our mock_client | ||
mocker.patch.object(VoyageAIEmbeddingEncoder, "create_client", return_value=mock_client) | ||
|
||
encoder = VoyageAIEmbeddingEncoder(config=VoyageAIEmbeddingConfig(api_key="api_key", model_name="voyage-law-2")) | ||
elements = encoder.embed_documents( | ||
elements=[Text("This is sentence 1"), Text("This is sentence 2")], | ||
) | ||
assert len(elements) == 2 | ||
assert elements[0].to_dict()["text"] == "This is sentence 1" | ||
assert elements[1].to_dict()["text"] == "This is sentence 2" |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our documentation has moved to https://docs.unstructured.io. We can do a separate PR into https://github.com/Unstructured-IO/docs/ with these updates. cc @cmscmadd @MKhalusova