Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chunksize should be derived from MEMORY env variable #123

Open
3 tasks
gessulat opened this issue Aug 1, 2024 · 0 comments
Open
3 tasks

Chunksize should be derived from MEMORY env variable #123

gessulat opened this issue Aug 1, 2024 · 0 comments

Comments

@gessulat
Copy link
Contributor

gessulat commented Aug 1, 2024

The issue was started by this discussion regarding MSAID's streaming branch.

For streaming several chunk sizes are defined and currently hard-coded. It would be desirable to modify them. Best would be that the chunk sizes are derived automatically and optimally given the memory constraints of the user.

TODO:

  • implement a function that given a small chunk of the PSM data can estimate the memory requirements given a chunk size. Probably, we need to estimate this dynamically as the number of feature columns in the Percolator format is dynamic.
  • implement a mechanism that gets user requirements regarding memory usage from an environment variable (e.g. MEMORY)
  • At runtime check user memory requirements or fall back to some default, estimate maximum chunk sizes and set them.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant