Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel data loading #2649

Open
jaheba opened this issue Feb 14, 2023 · 1 comment
Open

Parallel data loading #2649

jaheba opened this issue Feb 14, 2023 · 1 comment
Labels
enhancement New feature or request

Comments

@jaheba
Copy link
Contributor

jaheba commented Feb 14, 2023

We've removed parallel data loading, since multiprocessing caused some issues, made the code convoluted and didn't yield much performance benefits. Instead, we focus on improving throughput.

Still, we want to re-introduce parallel loading of data in the future. Getting this right is more difficult in Python because of the GIL, which more or less forces us to use multiprocessing. That approach has significant overhead and one needs to be careful to set up a multiprocessing pipeline.

@jaheba jaheba added the enhancement New feature or request label Feb 14, 2023
@admivsn
Copy link

admivsn commented Apr 19, 2024

Is this a blocker to the implementation of multi-GPU training?

I've described an issue where I am seeing 0 speed improvement with strategy="dpp" on a multi-GPU instance compared to that on a single-GPU instance.

If so, it would be good to have this documented as a limitation of this package, unless there is a workaround you know of?

Thanks in advance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants