-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: embeddings overhaul #120
Conversation
…at(embeddings)/derive-macro
…emeory-vector-store refactor: remove DocumentEmbeddings from in memory vector store
…e-feature docs(embeddings): finalize embeddings overhaul feature
* refactor: Big refactor * refactor: refactor Embed trait, fix all imports, rename files, fix macro * fix(embed trait): fix errors while testing * fix(lancedb): examples * docs: fix hyperlink * fmt: cargo fmt * PR; make requested changes * fix: change visibility of struct field * fix: failing tests --------- Co-authored-by: Christophe <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is very well written, especially with how internal code comments clearly define everything and how elegant and simplified the internals have become. When I start to get to the point of nitpicking, I feel like that indicates the PR is in a great spot!
-
I'm still not a fan of how
rig-core-derive
is a subcrate ofrig-core
. I understand namewise / recommendations led to this decision but I feel, because we are a monorepo, havingrig-core-dervive
be inside therig-core
crate sorta hides the crate a bit. I don't think this should block the PR though since it might be a larger, more annoying move to make at the last hour of the PR. -
A couple of the code documentation could use some
[]
so that the comments directly link to source.
- Very much a nitpick as it can be tedious to link every single object mention (like
EmbedError
->[EmbedError]
).
Overall, I think this is fantastic. I think some docs on both a) how one can migrate to the new embeddings setup b) how the derive macro is leveraged and c) general usage docs would be very helpful once we get our documentation up and running!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little confusing to refer to embedded content as both documents and texts
* docs: Make examples+docstrings a bit more realistic * feat: Add Embed implementation for &impl Embed * test: Reorganize tests * misc: Add `derive` feature to `all` feature flag * test: Fix dead code warning * test: Improve embed macro tests * test: Add additional embed macro test * docs: Add logging output to rag example * docs: Fix looging output in tools example * feat: Improve token usage log messages * test: Small changes to embedbing builder tests * style: cargo fmt * fix: Clippy + docstrings * docs: Fix docstring * test: Fix test
* fix: exclude embedding properties from top_n node query * refactor: more ergonomic index creation * docs(neo4j): update examples * fix: unused import in example * feat(provider): xAI (grok) integration (#106) * feat(xai): initial xai (grok) implementation * fix(xai): renamings + tests * style(xai): Update rig-core/src/providers/xai/client.rs Co-authored-by: Mathieu Bélanger <[email protected]> * style(xai): adds various comments and README improvements * fix(xai): add some print statements to the grok example * docs(xai): fix readme --------- Co-authored-by: Mathieu Bélanger <[email protected]> * fix(rig-mongodb): remove embeddings from `top_n` lookup (#115) * fix(mongodb): remove embeddings from `top_n` lookup * fix(mongodb): filter embeddings within agg pipeline * style(mongodb): clippy moment * fix(mongodb): dynamically get embedded fields from mongodb * fix(mongodb): apply fixes from comments * style(mongodb): fmt * docs(readme): add perplexity logo to integrations (#112) * docs(readme): add perplexity logo to integrations * fix: perplexity logo size * fix(readme): perplexity logo size * feat: embeddings API overhaul (#120) * feat: setup derive macro * test: test out writing embeddable macro * test: continue testing custom macro implementation * feat: macro generate trait bounds * refactor: split up macro into multiple files * refactor: move macro derive crate inside rig-core * feat: replace embedding logic with new embeddable trait and macro * refactor: refactor rag examples, delete document embedding struct * feat: remove document embedding from in memory store * refactor: remove DocumentEmbeddings from in memory vector store * refactor(examples): combine vector store with vector store index * docs: add and update docstrings * fix (examples): fix bugs in examples * style: cargo fmt * revert: revert vector store to main * docs: update emebddings builder docstrings * refactor: derive macro * tests: add unit tests on in memory store * fic(ci): asterix on pull request sto accomodate for epic branches * fix(ci): double asterix * feat: add error type on embeddable trait * refactor: move embeddings to its own module and seperate embeddable * refactor: split up macro into more files, fix all imports * fix: revert logging change * feat: handle tools with embeddingsbuilder * bug(macro): fix error when embed tags missing * style: cargo fmt * fix(tests): clippy * docs&revert: revert embeddable trait error type, add docstrings * style: cargo clippy * clippy(lancedb): fix unused function error * fix(test): remove useless assert false statement * cleanup: split up branch into 2 branches for readability * cleanup: revert certain changes during branch split * docs: revert doc string * fix: add embedding_docs to embeddable tool * refactor: use OneOrMany in Embbedable trait, make derive macro crate feature flag * tests: add some more tests * clippy: cargo clippy * docs: add docstring to oneormany * fix(macro): update error handling * refactor: reexport EmbeddingsBuilder in rig and update imports * feat: implement IntoIterator and Iterator for OneOrMany * refactor: rename from methods * tests: fix failing tests * refactor&fix: make PR review changes * fix: fix tests failing * test: add test on OneOrMany * style: cargo fmt * docs&fix: fix doc strings, implement iter_mut for OneOrMany * fix: update borrow and owning of macro * clippy: add back print statements * fix: fix issues caused by merge of derive macro branch * fix: fix cargo toml of lancedb and mongodb * refactor: use thiserror for OneOtMany::EmptyListError * feat: add OneOrMany to in memory vector store * style: cargo fmt * fix: update embeddingsbuilder import path * tests: add tests for embeddingsbuilder * clippy: add is empty method * fix: add feature flag to examples in mongodb and lancedb crates * fix: move lancedb fixtures into it's own file * fix: add dummy main function in fextures.rs for compiler * fix: revert fixture file, remove fixtures from cargo toml examples * fix: update fixture import in lancedb examples * refactor: rename D to T in embeddingsbuilder generics * refactor: remove clone * PR: update builder, docstrings, and std::markers tags * style: replace add with push * fix: fix mongodb example * fix: update lancedb and mongodb doc example * fix: typo * docs: add and fix docstrings and examples * docs: add more doc tests * feat: rename Embeddable trait to ExtractEmbeddingFields * feat: rename macro files, cargo fmt * PR; update docstrings, update `add_documents_with_id` function * doc: fix doc linting * misc: fmt * test: fix test * refactor(embeddings): embed trait definition (#89) * refactor: Big refactor * refactor: refactor Embed trait, fix all imports, rename files, fix macro * fix(embed trait): fix errors while testing * fix(lancedb): examples * docs: fix hyperlink * fmt: cargo fmt * PR; make requested changes * fix: change visibility of struct field * fix: failing tests --------- Co-authored-by: Christophe <[email protected]> * fix/docs: fix erros from merge, cleanup embeddings docstrings * fix: cargo clippy in examples * Feat: small improvements + fixes + tests (#128) * docs: Make examples+docstrings a bit more realistic * feat: Add Embed implementation for &impl Embed * test: Reorganize tests * misc: Add `derive` feature to `all` feature flag * test: Fix dead code warning * test: Improve embed macro tests * test: Add additional embed macro test * docs: Add logging output to rag example * docs: Fix looging output in tools example * feat: Improve token usage log messages * test: Small changes to embedbing builder tests * style: cargo fmt * fix: Clippy + docstrings * docs: Fix docstring * test: Fix test * style: Small renaming for consistency * docs: Improve docstrings * style: fmt * fix: `TextEmbedder::embed` visibility * docs: Simplified the `EmbeddingsBuilder` docstring example to focus on the builder * style: cargo fmt * docs: Small edit to lancedb examples --------- Co-authored-by: cvauclair <[email protected]> * misc: Add `rig-derive` missing manifest fields (#129) * feat: Improve `InMemoryVectorStore` API (#130) * feat: Improve `InMemoryVectorStore` API * style: clippy+fmt * test: fix test * fix: remove unused module (#132) * fix: exclude embedding properties from top_n node query * refactor: more ergonomic index creation * docs(neo4j): update examples * fix: unused import in example * fix(example): remove embedding field from Deserialization type --------- Co-authored-by: Mochan <[email protected]> Co-authored-by: Garance Buricatu <[email protected]> Co-authored-by: cvauclair <[email protected]>
Complete overhaul of embeddings.
See Issue:
#52
The biggest changes can be found in:
rig-core/src/embeddings
--> notice the test and doctests written to validate functionlities.rig-core/riig-core-derive
Other significant changes in all examples related to vector searching for all of the sibling crates. All of the examples have been re-compiled and ran.