-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiprocessing with start-method "fork" results in hang on shutdown #1921
Labels
🪳 bug
Something isn't working
🐧 linux
Linux-specific problems
🐍 Python API
Python logging API
user-request
This is a pressing issue for one of our users
Comments
I can sadly confirm that even all the changes in #2061 did not fix that. |
We have a user that is affected by this; they can't use |
3 tasks
jleibs
added a commit
that referenced
this issue
Jul 12, 2023
Closes #1921 ### What The crux of the problem is the following: > The child process is created with a single thread—the one that called fork(). The entire virtual address space of the parent is replicated in the child, ... The major consequence of this is that our global `RecordingStream` context is duplicated into the child memory space but none of the threads (batcher, tcp-sender, dropper, etc.) are duplicated. When we go to call `connect()` inside the forked process, we try to replace the global recording-stream, which subsequently tries to call drop on the forked copy of `RecordingStreamInner` . However, without any existing threads to process the flush, things just hang inside that flush call. We take a few actions to alleviate this problem: 1. Introduce a new SDK function: `cleanup_if_forked` which compares the process-ids on existing globals and forgets them as necessary. 1. In python, use `os.register_at_fork` to proactively call `cleanup_if_forked` in any forked child processes. 1. Also add a call to `cleanup_if_forked` inside of init() in case we're forking through a more exotic mechanism. 1. Check for the forked state anywhere we potentially flush to avoid deadlocks and produce a visible user-error. Additionally, it turns out that forked processes bypass the normal python `atexit` handler which means we don't get proper shutdown/flush behavior when the forked processes terminate. To help users workaround this, we introduce a `@shutdown_at_exit` decorator which can be used to decorate functions launched via multiprocessing. ### Testing On linux: ``` $ python examples/python/multiprocessing/main.py ``` observe demo exits cleanly and all data shows in viewer. ### Checklist * [x] I have read and agree to [Contributor Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and the [Code of Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md) * [x] I've included a screenshot or gif (if applicable) * [x] I have tested [demo.rerun.io](https://demo.rerun.io/pr/2676) (if applicable) - [PR Build Summary](https://build.rerun.io/pr/2676) - [Docs preview](https://rerun.io/preview/pr%3Ajleibs%2Fcleanup_if_forked/docs) - [Examples preview](https://rerun.io/preview/pr%3Ajleibs%2Fcleanup_if_forked/examples)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
🪳 bug
Something isn't working
🐧 linux
Linux-specific problems
🐍 Python API
Python logging API
user-request
This is a pressing issue for one of our users
Can be reproduced with the multiprocessing demo.
Current workaround is to always force "spawn":
However, needing to do this will definitely bite Linux users since "fork" is the default.
See: https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods
The text was updated successfully, but these errors were encountered: