Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in Pinecone batch selection logic #906

Closed
jamescalam opened this issue Feb 6, 2023 · 0 comments
Closed

Error in Pinecone batch selection logic #906

jamescalam opened this issue Feb 6, 2023 · 0 comments

Comments

@jamescalam
Copy link
Contributor

jamescalam commented Feb 6, 2023

Current implementation of pinecone vec db finds the batches using:

# set end position of batch
i_end = min(i + batch_size, len(texts))

link

But the following lines then go on to use a mix of [i : i + batch_size] and [i:i_end] to create batches:

# get batch of texts and ids
lines_batch = texts[i : i + batch_size]
# create ids if not provided
if ids:
    ids_batch = ids[i : i + batch_size]
else:
    ids_batch = [str(uuid.uuid4()) for n in range(i, i_end)]

Fortunately, there is a zip function a few lines down that cuts the potentially longer chunks, preventing an error from being raised — yet I don't think think [i: i+batch_size] should be maintained as it's confusing and not explicit

Raised a PR here #907

hwchase17 pushed a commit that referenced this issue Feb 6, 2023
Fix for issue #906 

Switches `[i : i + batch_size]` to `[i : i_end]` in Pinecone
`from_texts` method
zachschillaci27 pushed a commit to zachschillaci27/langchain that referenced this issue Mar 8, 2023
Fix for issue langchain-ai#906 

Switches `[i : i + batch_size]` to `[i : i_end]` in Pinecone
`from_texts` method
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant