Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parent Document Not Updating During Re-indexing with Parent Document Retriever #29801

Open
5 tasks done
bneel-work opened this issue Feb 14, 2025 · 0 comments
Open
5 tasks done
Labels
Ɑ: vector store Related to vector store module

Comments

@bneel-work
Copy link

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

Code examples

from langchain.retrievers import ParentDocumentRetriever
from langchain.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.storage import InMemoryStore
from langchain.text_splitter import RecursiveCharacterTextSplitter

# ... (Document loading logic) ...
docs = ... # Your list of parent documents

child_splitter = RecursiveCharacterTextSplitter(chunk_size=400)
vectorstore = Chroma(collection_name="full_documents", embedding_function=OpenAIEmbeddings())
store = InMemoryStore()
retriever = ParentDocumentRetriever(
    vectorstore=vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)

# Initial Ingestion (using custom index function):
result = index(
    someloader,  # Your data loader
    record_manager,  # Your record manager
    PGVStore,  # Your vector store class (Chroma in this case)
    cleanup="full", # Or your cleanup strategy
    source_id_key="source", # Your source ID key
)


# --- RE-INDEXING ---
updated_docs = ... # Load updated documents

# PROBLEM : Just calling the index function again DOES NOT update the parent documents correctly, and the record_manager doesn't return updated UIDs
# result = index(
#     someloader,  # Your data loader
#     record_manager,  # Your record manager
#     PGVStore,  # Your vector store class (Chroma in this case)
#     cleanup="full", # Or your cleanup strategy
#     source_id_key="source", # Your source ID key
# )

Error Message and Stack Trace (if applicable)

No response

Description

When building a GenAI application using LangChain and ingesting embeddings into a vector database, I've encountered an issue with the Parent Document Retriever during re-indexing. While child documents are successfully re-indexed, the parent document remains outdated. This occurs even when the Record Manager doesn't return the updated UIDs for the child documents, preventing manual updates.

To Reproduce

  1. Set up a LangChain application with a vector database (e.g., Chroma, FAISS, etc.) and a Parent Document Retriever.
  2. Ingest initial embeddings of parent and child documents.
  3. Modify some child documents.
  4. Trigger a re-indexing process. This could be through a specific re-indexing function or by deleting/re-adding the child documents.
  5. Observe the parent document. It remains unchanged, reflecting the original content, even though the child documents have been updated in the vector database.
  6. Check the Record Manager for updated UIDs of the re-indexed child documents. It appears these UIDs are not being correctly updated, preventing the parent document from being identified for updating.

Expected behavior

After re-indexing the child documents, the parent document should also be updated to reflect the changes in its children. The Record Manager should return the updated UIDs of the child documents, allowing the Parent Document Retriever to identify and update the corresponding parent document.

System Info

System information:
Unable to provide detailed system information due to environment restrictions.

@dosubot dosubot bot added the Ɑ: vector store Related to vector store module label Feb 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ɑ: vector store Related to vector store module
Projects
None yet
Development

No branches or pull requests

1 participant