This repository was archived by the owner on Nov 17, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Gluon DataLoader cannot release the processes in the pool #13521
Comments
@YutingZhang Seems like it's caused by jupyter since it may cache the sessions? |
@YutingZhang Okay, I found the problem is present on linux but not mac. |
@zhreshold Great. Thanks! |
TaoLv
added a commit
that referenced
this issue
Dec 6, 2018
…icense file" (#13558) * Revert "Chi_square_check for discrete distribution fix (#13543)" This reverts commit cf6e8cb. * Revert "Updated docs for randint operator (#13541)" This reverts commit e0ff3c3. * Revert "Simplifications and some fun stuff for the MNIST Gluon tutorial (#13094)" This reverts commit 8bbac82. * Revert "Fix #13521 (#13537)" This reverts commit f6b4665. * Revert "Add a retry to qemu_provision (#13551)" This reverts commit f6f8401. * Revert "[MXNET-769] Use MXNET_HOME in a tempdir in windows to prevent access denied due t… (#13531)" This reverts commit bd8e0f8. * Revert "[MXNET-1249] Fix Object Detector Performance with GPU (#13522)" This reverts commit 1c8972c. * Revert "Fixing a 404 in the ubuntu setup doc (#13542)" This reverts commit cb0db29. * Revert "Bumped minor version from 1.4.0 to 1.5.0 on master, updated License file (#13478)" This reverts commit 40db619.
@zhreshold Confirmed this as a python bug: https://bugs.python.org/issue34172 |
@YutingZhang Good to know, thanks |
zhaoyao73
pushed a commit
to zhaoyao73/incubator-mxnet
that referenced
this issue
Dec 13, 2018
* fix pool release * fix
zhaoyao73
pushed a commit
to zhaoyao73/incubator-mxnet
that referenced
this issue
Dec 13, 2018
…icense file" (apache#13558) * Revert "Chi_square_check for discrete distribution fix (apache#13543)" This reverts commit cf6e8cb. * Revert "Updated docs for randint operator (apache#13541)" This reverts commit e0ff3c3. * Revert "Simplifications and some fun stuff for the MNIST Gluon tutorial (apache#13094)" This reverts commit 8bbac82. * Revert "Fix apache#13521 (apache#13537)" This reverts commit f6b4665. * Revert "Add a retry to qemu_provision (apache#13551)" This reverts commit f6f8401. * Revert "[MXNET-769] Use MXNET_HOME in a tempdir in windows to prevent access denied due t… (apache#13531)" This reverts commit bd8e0f8. * Revert "[MXNET-1249] Fix Object Detector Performance with GPU (apache#13522)" This reverts commit 1c8972c. * Revert "Fixing a 404 in the ubuntu setup doc (apache#13542)" This reverts commit cb0db29. * Revert "Bumped minor version from 1.4.0 to 1.5.0 on master, updated License file (apache#13478)" This reverts commit 40db619.
zhaoyao73
added a commit
to zhaoyao73/incubator-mxnet
that referenced
this issue
Dec 13, 2018
* upstream/master: (54 commits) Add notes about debug with libstdc++ symbols (apache#13533) add cpp example inception to nightly test (apache#13534) Fix exception handling api doc (apache#13519) fix link for gluon model zoo (apache#13583) ONNX import/export: Size (apache#13112) Update MXNetTutorialTemplate.ipynb (apache#13568) fix the situation where idx didn't align with rec (apache#13550) Fix use-before-assignment in convert_dot (apache#13511) License update (apache#13565) Update version to v1.5.0 including clojure package (apache#13566) Fix flaky test test_random:test_randint_generator (apache#13498) Add workspace cleaning after job finished (apache#13490) Adding test for softmaxoutput (apache#13116) apache#13441 [Clojure] Add Spec Validations for the Random namespace (apache#13523) Revert "Bumped minor version from 1.4.0 to 1.5.0 on master, updated License file" (apache#13558) Chi_square_check for discrete distribution fix (apache#13543) Updated docs for randint operator (apache#13541) Simplifications and some fun stuff for the MNIST Gluon tutorial (apache#13094) Fix apache#13521 (apache#13537) Add a retry to qemu_provision (apache#13551) ...
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
https://github.com/apache/incubator-mxnet/blob/f2dcd7c7b8676b55d912997fc3f9c62c55915307/python/mxnet/gluon/data/dataloader.py#L532-L533
Logically, when a
DataLoader
is recycled, the_worker_pool
should be recycled, and theterminate()
of the_worker_pool
function should be called immediately. However, it did not ...Each time I kill a
DataLoader
, it leaves the worker processes dangling.I guess it is a bug of python
multiprocess.Pool
. Anyway, I think we can patch it by explicitly call_worker_pool.terminate()
Minimum code to reproduce the errors.
I recorded a video demo for this bug: https://drive.google.com/open?id=1q4CmU_F1vAtxoZ_KUmrIEfVRk3RsQfv8
Environment: today's mxnet from pip, python3.6 on p3
The text was updated successfully, but these errors were encountered: