Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Update VectorStoreIndex trait #42

Merged

Conversation

marieaurore123
Copy link
Contributor

@marieaurore123 marieaurore123 commented Sep 30, 2024

Github issue: #41

  • Refactor VectorStoreIndex trait

Notes:

  • kept DocumentEmbeddings as the InMemoryVectorStore value type. Did this because if I change it to a more general type, I would need to change the VectorStore trait which should be done in a separate branch.
  • method top_n_ids of VectorStoreIndex contains a generic type: <T: for<'a> Deserialize<'a>. Technically, it we don't need type T to get the ids of the top n documents. However, I found it simpler to get the ids of the top n by calling the existing method top_n which needs to be called like this top_n::<T>() and filtering the ids only. This also allows the implementation of the method top_n_ids to be to same for all trait implementations (ie. implementation defined in trait def)

Please let me know what you think about the second point! Thanks:)

@marieaurore123 marieaurore123 changed the title feat: Update VectorStoreIndex trait refactor: Update VectorStoreIndex trait Sep 30, 2024
Copy link
Contributor

@0xMochan 0xMochan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a lot of code got simplified 👍

rig-core/src/vector_store/mod.rs Outdated Show resolved Hide resolved
rig-core/src/vector_store/mod.rs Outdated Show resolved Hide resolved
rig-core/src/vector_store/mod.rs Outdated Show resolved Hide resolved
rig-core/src/agent.rs Outdated Show resolved Hide resolved
rig-core/src/vector_store/mod.rs Outdated Show resolved Hide resolved
@cvauclair
Copy link
Contributor

@garance-buricatu for the same of record keeping, here are my opinions on your points:

kept DocumentEmbeddings as the InMemoryVectorStore value type. Did this because if I change it to a more general type, I would need to change the VectorStore trait which should be done in a separate branch.

That's fine for now, we'll remove the DocumentEmbeddings type in a subsequent PR (or maybe keep it only for the internal representation of the InMemoryVectorStore)

method top_n_ids of VectorStoreIndex contains a generic type: <T: for<'a> Deserialize<'a>. Technically, it we don't need type T to get the ids of the top n documents. However, I found it simpler to get the ids of the top n by calling the existing method top_n which needs to be called like this top_n::() and filtering the ids only. This also allows the implementation of the method top_n_ids to be to same for all trait implementations (ie. implementation defined in trait def)

IMO we should remove the provided implementation since 1) some DBs might have a more optimized way to query the ids only (e.g.: SELECT id FROM ... instead of SELECT * FROM ...); and 2) we don't know the name of the primary key used by the documents in advance.

@marieaurore123 marieaurore123 merged commit 04469df into main Oct 1, 2024
2 checks passed
@github-actions github-actions bot mentioned this pull request Oct 1, 2024
@github-actions github-actions bot mentioned this pull request Oct 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants