Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embeddings index method is inefficient with memory #3

Closed
davidmezzetti opened this issue Aug 4, 2020 · 0 comments
Closed

Embeddings index method is inefficient with memory #3

davidmezzetti opened this issue Aug 4, 2020 · 0 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@davidmezzetti
Copy link
Member

The current index method makes a number of calls that force creating copies of the working embeddings index array. For larger datasources, this can cause out of memory errors. Make the following improvements:

  • When streaming embeddings back from disk, create an empty NumPy array initialized to the size of the embeddings index array. Appending NumPy arrays to a list will force a copy to be created, when creating the final NumPy embeddings array.

  • Modify the removePC method to operate directly on the input array vs returning a copy

  • Modify the normalize method to operate directly on the input array vs returning a copy

@davidmezzetti davidmezzetti added the enhancement New feature or request label Aug 4, 2020
@davidmezzetti davidmezzetti self-assigned this Dec 2, 2021
@davidmezzetti davidmezzetti added this to the v1.1.0 milestone Dec 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant