Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add VoyageAI embeddings #3069

Merged
merged 16 commits into from
May 24, 2024
Merged

Conversation

fzowl
Copy link
Contributor

@fzowl fzowl commented May 21, 2024

Adding VoyageAI embeddings
Voyage AI’s embedding models and rerankers are state-of-the-art in retrieval accuracy.

@MthwRobinson MthwRobinson changed the title Adding VoyageAI embeddings feat: add VoyageAI embeddings May 22, 2024
Copy link
Contributor

@MthwRobinson MthwRobinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fzowl! PR looks great, just a couple of minor comments.

print(embedding_encoder.is_unit_vector(), embedding_encoder.num_of_dimensions())

``VoyageAIEmbeddingEncoder``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our documentation has moved to https://docs.unstructured.io. We can do a separate PR into https://github.com/Unstructured-IO/docs/ with these updates. cc @cmscmadd @MKhalusova

requirements/ingest/embed-voyageai.txt Outdated Show resolved Hide resolved
@MthwRobinson
Copy link
Contributor

@fzowl - Could you also add a summary of the PR to the PR description?

fzowl and others added 2 commits May 22, 2024 20:03
@fzowl
Copy link
Contributor Author

fzowl commented May 22, 2024

@MthwRobinson Added a short summary, can you please take a look?

@fzowl fzowl requested a review from MthwRobinson May 22, 2024 18:08
@MthwRobinson
Copy link
Contributor

Summary looks good. Could you remove the .rst file you created for documentation and add docs in https://github.com/Unstructured-IO/docs? The docs directory in this repo is deprecated now in favor of the new one.

@fzowl
Copy link
Contributor Author

fzowl commented May 22, 2024

@MthwRobinson I removed the .rst file and i opened a PR for the docs: Unstructured-IO/docs#43

@MKhalusova
Copy link

We can merge the docs after this PR is merged.

Copy link
Contributor

@MthwRobinson MthwRobinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if tests pass!

@MthwRobinson
Copy link
Contributor

@fzowl - Looks like there are some merge conflicts to clean up, but otherwise good to go

@MthwRobinson MthwRobinson enabled auto-merge May 23, 2024 16:55
@fzowl
Copy link
Contributor Author

fzowl commented May 23, 2024

@MthwRobinson I resolved the conflicts, but please check the CHANGELOG.md file

@MthwRobinson
Copy link
Contributor

CHANGELOG entry looks good. For those test failures, make tidy and make pip-compile should take care of those.

auto-merge was automatically disabled May 23, 2024 19:01

Head branch was pushed to by a user without write access

@fzowl
Copy link
Contributor Author

fzowl commented May 23, 2024

@MthwRobinson Ok, i run make tidy and make pip-compile :-)

@MthwRobinson MthwRobinson enabled auto-merge May 23, 2024 20:41
@MthwRobinson
Copy link
Contributor

One character off. Soooo close!

./examples/embed/example_voyageai.py:6:101: E501 line too long (101 > 100 characters)

auto-merge was automatically disabled May 23, 2024 21:40

Head branch was pushed to by a user without write access

@fzowl
Copy link
Contributor Author

fzowl commented May 23, 2024

@MthwRobinson corrected the same

@fzowl
Copy link
Contributor Author

fzowl commented May 24, 2024

@MthwRobinson What is the next step here? I mean, i see that the auto-merge is enabled now, but the merging is blocked still and 3 workflows awaiting approval. Is there anything else i should do, is there anything open still?

@MthwRobinson MthwRobinson changed the base branch from main to feat/voyage May 24, 2024 20:51
@MthwRobinson MthwRobinson disabled auto-merge May 24, 2024 20:51
@MthwRobinson MthwRobinson merged commit f975441 into Unstructured-IO:feat/voyage May 24, 2024
@MthwRobinson
Copy link
Contributor

@fzowl - Merged into a feature branch and will try to fix those dependency issues in #3099

@fzowl
Copy link
Contributor Author

fzowl commented May 24, 2024

@MthwRobinson Thank you very much for all your help!

github-merge-queue bot pushed a commit that referenced this pull request May 24, 2024
Original PR was #3069. Merged in to a feature branch to fix dependency
and linting issues. Application code changes from the original PR were
already reviewed and approved.

------------
Original PR description:
Adding VoyageAI embeddings 
Voyage AI’s embedding models and rerankers are state-of-the-art in
retrieval accuracy.

---------

Co-authored-by: fzowl <[email protected]>
Co-authored-by: Liuhong99 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants