Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(neo4j): remove embeddings from top_n lookup #118

Merged
merged 18 commits into from
Dec 2, 2024

Conversation

mateobelanger
Copy link
Member

@mateobelanger mateobelanger commented Nov 19, 2024

Description of the Fix

Neo4jVectorIndex::top_n::<Value>() now prevents returning the embedding vector property. When a Rig index object is created, a DB query retrieves the associated Neo4j index and the property name that stores it's embeddings vector.

Note

Incidentally, I also improved the flow of creating indexes from an empty DB. The client now is in charge of both DB index creation and returning the Rig index object matching an existing DB index.

@mateobelanger mateobelanger self-assigned this Nov 19, 2024
@mateobelanger mateobelanger changed the title fix(neo4j): top_n() return embeddings data fix(neo4j): remove embeddings from top_n lookup Nov 19, 2024
Copy link
Contributor

@0xMochan 0xMochan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments!

vector_dimensions: i64,
#[serde(rename = "vector.similarity_function")]
vector_similarity_function: String,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why define these structs within the method here? Doesn't this make exterior usage of these awkward?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These structures would be of no use (for now). It's to make the deserialization more readable while keeping the module less clutered. Don't know if there is a better way to do this though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Index `{}` not found in database. Available indexes: {:?}",
index_name, indexes
),
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error construction seems complicated, it seems cleaner to just pass a format!(...).into() directly to the DatastoreError.

0xMochan and others added 4 commits November 20, 2024 10:26
* feat(xai): initial xai (grok) implementation

* fix(xai): renamings + tests

* style(xai): Update rig-core/src/providers/xai/client.rs

Co-authored-by: Mathieu Bélanger <[email protected]>

* style(xai): adds various comments and README improvements

* fix(xai): add some print statements to the grok example

* docs(xai): fix readme

---------

Co-authored-by: Mathieu Bélanger <[email protected]>
* fix(mongodb): remove embeddings from `top_n` lookup

* fix(mongodb): filter embeddings within agg pipeline

* style(mongodb): clippy moment

* fix(mongodb): dynamically get embedded fields from mongodb

* fix(mongodb): apply fixes from comments

* style(mongodb): fmt
* docs(readme): add perplexity logo to integrations
* fix: perplexity logo size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example currently fails because of line 119: .top_n::<Document>("What is a glarb?", 1).

Document expects a field embedding of type Vec<f32>. However the PR prunes this field from the Neo4j response.

Solution is simply to change Document to something without an embedding field

marieaurore123 and others added 10 commits November 29, 2024 11:27
* feat: setup derive macro

* test: test out writing embeddable macro

* test: continue testing custom macro implementation

* feat: macro generate trait bounds

* refactor: split up macro into multiple files

* refactor: move macro derive crate inside rig-core

* feat: replace embedding logic with new embeddable trait and macro

* refactor: refactor rag examples, delete document embedding struct

* feat: remove document embedding from in memory store

* refactor: remove DocumentEmbeddings from in memory vector store

* refactor(examples): combine vector store with vector store index

* docs: add and update docstrings

* fix (examples): fix bugs in examples

* style: cargo fmt

* revert: revert vector store to main

* docs: update emebddings builder docstrings

* refactor: derive macro

* tests: add unit tests on in memory store

* fic(ci): asterix on pull request sto accomodate for epic branches

* fix(ci): double asterix

* feat: add error type on embeddable trait

* refactor: move embeddings to its own module and seperate embeddable

* refactor: split up macro into more files, fix all imports

* fix: revert logging change

* feat: handle tools with embeddingsbuilder

* bug(macro): fix error when embed tags missing

* style: cargo fmt

* fix(tests): clippy

* docs&revert: revert embeddable trait error type, add docstrings

* style: cargo clippy

* clippy(lancedb): fix unused function error

* fix(test): remove useless assert false statement

* cleanup: split up branch into 2 branches for readability

* cleanup: revert certain changes during branch split

* docs: revert doc string

* fix: add embedding_docs to embeddable tool

* refactor: use OneOrMany in Embbedable trait, make derive macro crate feature flag

* tests: add some more tests

* clippy: cargo clippy

* docs: add docstring to oneormany

* fix(macro): update error handling

* refactor: reexport EmbeddingsBuilder in rig and update imports

* feat: implement IntoIterator and Iterator for OneOrMany

* refactor: rename from methods

* tests: fix failing tests

* refactor&fix: make PR review changes

* fix: fix tests failing

* test: add test on OneOrMany

* style: cargo fmt

* docs&fix: fix doc strings, implement iter_mut for OneOrMany

* fix: update borrow and owning of macro

* clippy: add back print statements

* fix: fix issues caused by merge of derive macro branch

* fix: fix cargo toml of lancedb and mongodb

* refactor: use thiserror for OneOtMany::EmptyListError

* feat: add OneOrMany to in memory vector store

* style: cargo fmt

* fix: update embeddingsbuilder import path

* tests: add tests for embeddingsbuilder

* clippy: add is empty method

* fix: add feature flag to examples in mongodb and lancedb crates

* fix: move lancedb fixtures into it's own file

* fix: add dummy main function in fextures.rs for compiler

* fix: revert fixture file, remove fixtures from cargo toml examples

* fix: update fixture import in lancedb examples

* refactor: rename D to T in embeddingsbuilder generics

* refactor: remove clone

* PR: update builder, docstrings, and std::markers tags

* style: replace add with push

* fix: fix mongodb example

* fix: update lancedb and mongodb doc example

* fix: typo

* docs: add and fix docstrings and examples

* docs: add more doc tests

* feat: rename Embeddable trait to ExtractEmbeddingFields

* feat: rename macro files, cargo fmt

* PR; update docstrings, update `add_documents_with_id` function

* doc: fix doc linting

* misc: fmt

* test: fix test

* refactor(embeddings): embed trait definition (#89)

* refactor: Big refactor

* refactor: refactor Embed trait, fix all imports, rename files, fix macro

* fix(embed trait): fix errors while testing

* fix(lancedb): examples

* docs: fix hyperlink

* fmt: cargo fmt

* PR; make requested changes

* fix: change visibility of struct field

* fix: failing tests

---------

Co-authored-by: Christophe <[email protected]>

* fix/docs: fix erros from merge, cleanup embeddings docstrings

* fix: cargo clippy in examples

* Feat: small improvements + fixes + tests (#128)

* docs: Make examples+docstrings a bit more realistic

* feat: Add Embed implementation for &impl Embed

* test: Reorganize tests

* misc: Add `derive` feature to `all` feature flag

* test: Fix dead code warning

* test: Improve embed macro tests

* test: Add additional embed macro test

* docs: Add logging output to rag example

* docs: Fix looging output in tools example

* feat: Improve token usage log messages

* test: Small changes to embedbing builder tests

* style: cargo fmt

* fix: Clippy + docstrings

* docs: Fix docstring

* test: Fix test

* style: Small renaming for consistency

* docs: Improve docstrings

* style: fmt

* fix: `TextEmbedder::embed` visibility

* docs: Simplified the `EmbeddingsBuilder` docstring example to focus on the builder

* style: cargo fmt

* docs: Small edit to lancedb examples

---------

Co-authored-by: cvauclair <[email protected]>
* feat: Improve `InMemoryVectorStore` API

* style: clippy+fmt

* test: fix test
@mateobelanger mateobelanger linked an issue Dec 2, 2024 that may be closed by this pull request
@mateobelanger mateobelanger merged commit 62c7bc5 into main Dec 2, 2024
4 checks passed
This was referenced Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug(neo4j): top_n function returns embeddings data
4 participants