-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Move vectorize to Astra DB Component #3766
feat: Move vectorize to Astra DB Component #3766
Conversation
This pull request is automatically being deployed by Amplify Hosting (learn more). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for the backwards compatibility
Please add a components test
@@ -110,12 +157,6 @@ class AstraVectorStoreComponent(LCVectorStoreComponent): | |||
info="Optional list of metadata fields to include in the indexing.", | |||
advanced=True, | |||
), | |||
HandleInput( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users that are using this input will update langflow, refresh the component and it will be broken.
I think we need to keep it backwards compatible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nicoloboschi definitely see your point here - but, i did maintain the embeddings support... the broken compatibility would be if someone had a flow that was using the AstraVectorize component as input to this input, right? in this PR, i remove that component entirely since its built in now... are you suggesting we should keep the separate component as well, for backwards compatibility purposes?
I think in other cases the backwards compatibility is maintained...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that that was mentioned yeah, keep the separate component around for backwards compatibility. On the other hand...how many people have a flow using vectorize right now? If we can significantly reduce confusion for future vectorize users by removing this separate component, it may be worth breaking backwards-compatibility in this case.
(disclaimer: I, of course, would never advocate for breaking backwards-compatibility)
890c5ec
to
111a2f9
Compare
111a2f9
to
05d9476
Compare
@@ -59,6 +93,19 @@ class AstraVectorStoreComponent(LCVectorStoreComponent): | |||
info="Optional namespace within Astra DB to use for the collection.", | |||
advanced=True, | |||
), | |||
DropdownInput( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we default to one or the other?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! For compatibility, i suppose defaulting to Embeddings makes sense so we dont break existing flows that were using it. (we are currently breaking Vectorize-based flows as mentioned...)
Side note, you may be wondering why its a dropdown rather than a boolean input / toggle... i tried to use the switch, but for whatever reason when disabling the flag, it didnt trigger the call to update_build_config - something i want to bring up with people on the langflow team...
vector_store = self.build_vector_store() | ||
def search_documents(self, vector_store=None) -> list[Data]: | ||
if not vector_store: | ||
vector_store = self.build_vector_store() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this change - the check_cached decorator handles this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the only reason i made this change was to more easily allow for a test to be created. The problem with the tests is that i wasnt sure how i could build the component with parameters that werent in the initial configuration - in the UI, it'll dynamically update the components based on the value of the dropdown, but is there a way to programmatically perform that same computation? i.e., something like
component.update_build_configuration(embedding_service = "Astra Vectorize")
So i allowed an optional inclusion of the vector store object for purposes of the test, but that would never be used in the happy path in the component. Let me know though if you see a better way to do that
"collection_embedding_api_key": self.z_03_provider_api_key or kwargs.get("z_03_provider_api_key"), | ||
} | ||
|
||
def build_vector_store(self, vectorize_options=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious about the parameter addition here. Is it only used for testing purposes? In the main path, there's no scenario where this wouldn't be None,
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct! I left a comment up above for a similar reason why it was done, but if there's a better / easier way i missed let me know cuz i didnt like it either. The goal was basically to allow for the tests to execute successfully with pytest (and for what its worth, the 5 test_astra_vectorize tests do in fact work for me) but i would love if rather than having these optional parameters it simulated more like how the UI executes the code. The dynamic inputs is the challenge...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm yeah it's very possible we don't yet have any tests that use dynamic inputs, and thus haven't had to support a way to do that. I'll pull this and play around a bit this afternoon just to try as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good - I think it's fine to get this merged and then we can play around with the changes made to support the test
This pull request removes the AstraVectorize component, and instead includes options for vectorize directly in the Astra DB Component, with a dynamic UI depending on selections
astradb_vectorize_component_09112024.mp4