You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are running on linux, where the default multiprocess behavior is fork
Something forces a gc to happen in the subprocess
Explanation
Although we already have a fork-handler that cleans up our global/thread-local recording streams, it's apparently still possible for an allocated PyRecordingStream to leak into the subprocess via fork (at least based on how pytorch multiprocessing works).
During __del__, we make one last call to a non-blocking flush.
While previously this was fine, we added an internal blocking batcher flush to our non-blocking sink flush, which still hangs for the same reason (the batcher processing thread is gone in the fork).
The text was updated successfully, but these errors were encountered:
jleibs
changed the title
Hang during multiprocess when using hugging face torch datasets with num_workers > 0
Hang when using torch.multiprocessing after having called rr.initMay 8, 2024
Minimal repro:
The hang only happens if:
Explanation
Although we already have a fork-handler that cleans up our global/thread-local recording streams, it's apparently still possible for an allocated PyRecordingStream to leak into the subprocess via fork (at least based on how pytorch multiprocessing works).
During
__del__
, we make one last call to a non-blocking flush.While previously this was fine, we added an internal blocking batcher flush to our non-blocking sink flush, which still hangs for the same reason (the batcher processing thread is gone in the fork).
The text was updated successfully, but these errors were encountered: