Fix possible hang when using torch.multiprocessing #6271
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What
rr.init
#6223Although we already have a fork-handler that cleans up our global/thread-local recording streams, it's apparently still possible for an allocated PyRecordingStream to leak into the subprocess via fork (at least based on how pytorch multiprocessing works).
During
__del__
, we make one last call to a non-blocking flush.While previously this was fine, we added an internal blocking batcher flush to our non-blocking sink flush, which still hangs for the same reason (the batcher processing thread is gone in the fork).
The fixes:
is_forked_child
in all the methods that issueinner.batcher.flush_blocking()
is_forked_child
from__del__
and don't call flush to avoid getting a gratuitous warning printout.Checklist
main
build: rerun.io/viewernightly
build: rerun.io/viewerTo run all checks from
main
, comment on the PR with@rerun-bot full-check
.