Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Move vectorize to Astra DB Component #3766

Merged

Conversation

erichare
Copy link
Collaborator

@erichare erichare commented Sep 11, 2024

This pull request removes the AstraVectorize component, and instead includes options for vectorize directly in the Astra DB Component, with a dynamic UI depending on selections

astradb_vectorize_component_09112024.mp4

Copy link

This pull request is automatically being deployed by Amplify Hosting (learn more).

Access this pull request here: https://pr-3766.dmtpw4p5recq1.amplifyapp.com

@erichare erichare self-assigned this Sep 11, 2024
Copy link
Contributor

@nicoloboschi nicoloboschi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for the backwards compatibility
Please add a components test

@@ -110,12 +157,6 @@ class AstraVectorStoreComponent(LCVectorStoreComponent):
info="Optional list of metadata fields to include in the indexing.",
advanced=True,
),
HandleInput(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users that are using this input will update langflow, refresh the component and it will be broken.
I think we need to keep it backwards compatible

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nicoloboschi definitely see your point here - but, i did maintain the embeddings support... the broken compatibility would be if someone had a flow that was using the AstraVectorize component as input to this input, right? in this PR, i remove that component entirely since its built in now... are you suggesting we should keep the separate component as well, for backwards compatibility purposes?

I think in other cases the backwards compatibility is maintained...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that that was mentioned yeah, keep the separate component around for backwards compatibility. On the other hand...how many people have a flow using vectorize right now? If we can significantly reduce confusion for future vectorize users by removing this separate component, it may be worth breaking backwards-compatibility in this case.

(disclaimer: I, of course, would never advocate for breaking backwards-compatibility)

@erichare erichare changed the title [LFEN-1] Move vectorize to Astra DB Component Move vectorize to Astra DB Component Sep 11, 2024
@erichare erichare force-pushed the feat/astra-db-component-vectorize branch from 890c5ec to 111a2f9 Compare September 11, 2024 19:09
@erichare erichare force-pushed the feat/astra-db-component-vectorize branch from 111a2f9 to 05d9476 Compare September 11, 2024 19:11
@erichare erichare changed the title Move vectorize to Astra DB Component feat: Move vectorize to Astra DB Component Sep 11, 2024
@github-actions github-actions bot added the enhancement New feature or request label Sep 11, 2024
@erichare erichare marked this pull request as ready for review September 11, 2024 21:34
@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Sep 11, 2024
@@ -59,6 +93,19 @@ class AstraVectorStoreComponent(LCVectorStoreComponent):
info="Optional namespace within Astra DB to use for the collection.",
advanced=True,
),
DropdownInput(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we default to one or the other?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! For compatibility, i suppose defaulting to Embeddings makes sense so we dont break existing flows that were using it. (we are currently breaking Vectorize-based flows as mentioned...)

Side note, you may be wondering why its a dropdown rather than a boolean input / toggle... i tried to use the switch, but for whatever reason when disabling the flag, it didnt trigger the call to update_build_config - something i want to bring up with people on the langflow team...

vector_store = self.build_vector_store()
def search_documents(self, vector_store=None) -> list[Data]:
if not vector_store:
vector_store = self.build_vector_store()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this change - the check_cached decorator handles this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the only reason i made this change was to more easily allow for a test to be created. The problem with the tests is that i wasnt sure how i could build the component with parameters that werent in the initial configuration - in the UI, it'll dynamically update the components based on the value of the dropdown, but is there a way to programmatically perform that same computation? i.e., something like

component.update_build_configuration(embedding_service = "Astra Vectorize")

So i allowed an optional inclusion of the vector store object for purposes of the test, but that would never be used in the happy path in the component. Let me know though if you see a better way to do that

"collection_embedding_api_key": self.z_03_provider_api_key or kwargs.get("z_03_provider_api_key"),
}

def build_vector_store(self, vectorize_options=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious about the parameter addition here. Is it only used for testing purposes? In the main path, there's no scenario where this wouldn't be None, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct! I left a comment up above for a similar reason why it was done, but if there's a better / easier way i missed let me know cuz i didnt like it either. The goal was basically to allow for the tests to execute successfully with pytest (and for what its worth, the 5 test_astra_vectorize tests do in fact work for me) but i would love if rather than having these optional parameters it simulated more like how the UI executes the code. The dynamic inputs is the challenge...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm yeah it's very possible we don't yet have any tests that use dynamic inputs, and thus haven't had to support a way to do that. I'll pull this and play around a bit this afternoon just to try as well

Copy link
Collaborator

@jordanrfrazier jordanrfrazier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - I think it's fine to get this merged and then we can play around with the changes made to support the test

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Sep 13, 2024
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Sep 13, 2024
@ogabrielluiz ogabrielluiz merged commit f6d93fc into langflow-ai:main Sep 19, 2024
37 of 38 checks passed
@erichare erichare deleted the feat/astra-db-component-vectorize branch September 25, 2024 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request lgtm This PR has been approved by a maintainer size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants