-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Init'ing Dataloader calls get_train_dataloader #922
Comments
ummm good point. I think we need to simplify the get_XXX_dataloader() calls. The lazy loading decorator does make it harder for people to debug their data loading issues. @neggert do you remember the original reason we added the decorator? maybe it's time to remove it and simplify this logic? |
@ethanwharris @jakubczakon @MattPainter01 |
I always assumed the decorator was to stop multiple instantiation - there was some old bug where data loading threads would hang around after each epoch because new data loaders were created and the old threads just carried on - having said that I can't find the issue anywhere The If there's some way we can remove the decorator but still only create the dataloader once then that would be a big usability improvement :) |
agreed. that was the original reason. basically we could refactor to make sure we only call it at time of the epoch beginning. i think we needed it before to determine length and some other reasons |
I'm stuck on a couple issues here actually that I can't unwind. Main Issue: I don't really understand the semantics of Side Issue: It's very difficult to determine ordering. I had been calling A related issue to this is that loading the data set 8x times on TPU blows up the limited amount of RAM to Colab allows for. It would be nice to avoid this issue. |
It seems like the code of initializing the dataloader calls into getting the dataloader.
https://github.com/PyTorchLightning/pytorch-lightning/blob/c00a8a10dd32fa43a659a09b53e2dea3739c6d4e/pytorch_lightning/trainer/data_loading.py#L69
This means that all the effort for wrapping get_dataloader to sync through barriers for multi-gpu / tpu is not used on this first call (and results in a crash).
The text was updated successfully, but these errors were encountered: