Skip to content

Conversation

@thomasw21
Copy link
Member

This is linked to the problem of increasing memory usage when preprocessing dataset. I believe the issue is that the imap is much faster than the single thread write. Consequently the memory usage is going up. In this PR, we suggest to use a global semaphore, that limits the number of samples stored in memory, ie we wait for the consumer to process X amount of samples before allowing the generator to generate more samples.

Same PR, just rebased on the correct branch.

@TevenLeScao
Copy link
Collaborator

Hey, as discussed, this is significant slower than just running with less workers. We'll stick with that for now to not complexify the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants