-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Gluon DataLoader incorrectly terminates the process pool in 1.4 #15025
Comments
Hey, this is the MXNet Label Bot. |
@zhreshold Can you take a look? |
A workaround is to let DataLoader and iterator have same scope. The worker pool is managed by dataloader for sake of resource conservation in case users created hundreds of iterators out of the same dataloader. |
Got it! I see why the worker pool init was moved inside DataLoader. Ref counting is a good option but it might be better if we let python handle scope of objects rather than managing it on our side. |
@chandana1332 Yep, this is actually better. Would you like to contribute a minor patch for it? |
Yeah, I can. Do you have instructions on what/how to test, etc? |
Test cases are located in https://github.com/apache/incubator-mxnet/blob/master/tests/python/unittest/test_gluon_data.py and you can add the failure case you just posted. |
Okay, I can add a test there. What is the release date for 1.5? |
not sure yet |
@chandana1332 While we want to release 1.5.0 as soon as possible, we still have a couple of PRs left before cutting the release candidate. https://lists.apache.org/thread.html/b8719632a1d23da619349e91610223c090071c02658d5153fa8f0757@%3Cdev.mxnet.apache.org%3E In the meantime if you'd like to post the PR soon, @roywei is overseeing the release and helping coordinate. Just let us know when the change will be ready. |
Yeah, I should have it ready soon. Is there a date by when I need to put the PR in? |
Hi @chandana1332, sorry I just saw this. As you can see in the above dev list discussion and release tracker. We are aming to tag 1.5.0 by today(06/07/2019). Do you need this change in 1.5.0? Once you create a PR and merge it, it will be available through nightly pip packages ( |
Hey Wei, I should have the patch ready by tomorrow. Not sure what the turn around time for releasing a pull request is but from my side, I'm okay if it gets patched to 1.5.1. Thank you for checking in. |
Issue: apache#15025 Fix: Broadened the scope of worker pool to iterators. Passed a reference of dataloader to the multi worker iterator
Issue: apache#15025 Fix: Broadened the scope of worker pool to iterators. Passed a reference of dataloader to the multi worker iterator
Issue: apache#15025 Fix: Broadened the scope of worker pool to iterators. Passed a reference of dataloader to the multi worker iterator
Issue: apache#15025 Fix: Broadened the scope of worker pool to iterators. Passed a reference of dataloader to the multi worker iterator
Issue: apache#15025 Fix: Broadened the scope of worker pool to iterators. Passed a reference of dataloader to the multi worker iterator
Issue: apache#15025 Fix: Broadened the scope of worker pool to iterators. Passed a reference of dataloader to the multi worker iterator
* Fixed a bug in Gluon DataLoader. Issue: #15025 Fix: Broadened the scope of worker pool to iterators. Passed a reference of dataloader to the multi worker iterator * Fixed a bug in Gluon DataLoader. Issue: #15025 Fix: Broadened the scope of worker pool to iterators. Passed a reference of dataloader to the multi worker iterator * Fixed a bug in Gluon DataLoader. Issue: #15025 Fix: Broadened the scope of worker pool to iterators. Passed a reference of dataloader to the multi worker iterator * Fixed a bug in Gluon DataLoader. Issue: #15025 Fix: Broadened the scope of worker pool to iterators. Passed a reference of dataloader to the multi worker iterator
* Fixed a bug in Gluon DataLoader. Issue: apache#15025 Fix: Broadened the scope of worker pool to iterators. Passed a reference of dataloader to the multi worker iterator * Fixed a bug in Gluon DataLoader. Issue: apache#15025 Fix: Broadened the scope of worker pool to iterators. Passed a reference of dataloader to the multi worker iterator * Fixed a bug in Gluon DataLoader. Issue: apache#15025 Fix: Broadened the scope of worker pool to iterators. Passed a reference of dataloader to the multi worker iterator * Fixed a bug in Gluon DataLoader. Issue: apache#15025 Fix: Broadened the scope of worker pool to iterators. Passed a reference of dataloader to the multi worker iterator
Description
Gluon DataLoader terminates the process pool early while _MultiWorkerIter is operating on the pool.
Cause: https://github.com/apache/incubator-mxnet/pull/13537/files
As seen in the patch, the process pool is terminated when DataLoader is garbage collected but the scope of the process pool goes beyond the DataLoader until _MultiWorkerIter
Environment info (Required)
I'm using Python 3.7
Error Message:
Minimum reproducible example
num_worker >0
What have you tried to solve it?
The text was updated successfully, but these errors were encountered: