Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AstraDB component Ingest all items twice #2552

Closed
thomasbachet opened this issue Jul 5, 2024 · 2 comments · Fixed by #2573
Closed

AstraDB component Ingest all items twice #2552

thomasbachet opened this issue Jul 5, 2024 · 2 comments · Fixed by #2573
Labels
bug Something isn't working

Comments

@thomasbachet
Copy link

thomasbachet commented Jul 5, 2024

As I run the flow to ingest data, it does it twice. So each item is captured twice in the AstraDB data collection.

Browser and Version

  • Langflow version 1.0.6 running on local a virtual environment

To Reproduce

  • use the pre-built flow 'Vector Store RAG'
  • use the existing flow to ingest any content

Does anyone that experience the same?

@dosubot dosubot bot added the bug Something isn't working label Jul 5, 2024
Copy link

dosubot bot commented Jul 5, 2024

Hey @thomasbachet! I'm here to help you with any issues you're experiencing. I can assist with bugs, answer questions, and guide you on becoming a contributor. Let's work together to resolve any issues you may have.

The issue of the AstraDB component ingesting items twice could be related to the caching mechanism within the AstraDBVectorStoreComponent. The component's configuration includes fields with "cache": true, which might cause the same data to be ingested twice [1].

Additionally, the concurrency settings in the AstraDBVectorStoreComponent could potentially lead to duplicate entries if the same data is processed simultaneously by different threads or processes. The component handles concurrency through parameters like bulk_insert_batch_concurrency, bulk_insert_overwrite_concurrency, and bulk_delete_concurrency, but it does not explicitly address deduplication logic [2].

To mitigate this issue, you might want to review the caching settings and ensure that there is a mechanism to check for existing documents before adding new ones.

To continue talking to Dosu, mention @dosu.

@nicoloboschi
Copy link
Contributor

Hey @thomasbachet thanks for reporting. Yes it's a bug. I've fixed it in this PR #2573
Meanwhile you could replace the entire astradb component with this content: https://raw.githubusercontent.com/langflow-ai/langflow/98beda5532b9029e016d966cfc81bca1722d4c5c/src/backend/base/langflow/components/vectorstores/AstraDB.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants