Multiprocessing with start-method "fork" results in hang on shutdown #1921

jleibs · 2023-04-19T12:40:57Z

Can be reproduced with the multiprocessing demo.

Current workaround is to always force "spawn":

multiprocessing.set_start_method("spawn")

However, needing to do this will definitely bite Linux users since "fork" is the default.

See: https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods

The text was updated successfully, but these errors were encountered:

teh-cmc · 2023-05-09T13:05:29Z

I can sadly confirm that even all the changes in #2061 did not fix that.

emilk · 2023-06-14T14:55:06Z

We have a user that is affected by this; they can't use spawn, so they can't update Rerun right now.

Closes #1921 ### What The crux of the problem is the following: > The child process is created with a single thread—the one that called fork(). The entire virtual address space of the parent is replicated in the child, ... The major consequence of this is that our global `RecordingStream` context is duplicated into the child memory space but none of the threads (batcher, tcp-sender, dropper, etc.) are duplicated. When we go to call `connect()` inside the forked process, we try to replace the global recording-stream, which subsequently tries to call drop on the forked copy of `RecordingStreamInner` . However, without any existing threads to process the flush, things just hang inside that flush call. We take a few actions to alleviate this problem: 1. Introduce a new SDK function: `cleanup_if_forked` which compares the process-ids on existing globals and forgets them as necessary. 1. In python, use `os.register_at_fork` to proactively call `cleanup_if_forked` in any forked child processes. 1. Also add a call to `cleanup_if_forked` inside of init() in case we're forking through a more exotic mechanism. 1. Check for the forked state anywhere we potentially flush to avoid deadlocks and produce a visible user-error. Additionally, it turns out that forked processes bypass the normal python `atexit` handler which means we don't get proper shutdown/flush behavior when the forked processes terminate. To help users workaround this, we introduce a `@shutdown_at_exit` decorator which can be used to decorate functions launched via multiprocessing. ### Testing On linux: ``` $ python examples/python/multiprocessing/main.py ``` observe demo exits cleanly and all data shows in viewer. ### Checklist * [x] I have read and agree to [Contributor Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and the [Code of Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md) * [x] I've included a screenshot or gif (if applicable) * [x] I have tested [demo.rerun.io](https://demo.rerun.io/pr/2676) (if applicable) - [PR Build Summary](https://build.rerun.io/pr/2676) - [Docs preview](https://rerun.io/preview/pr%3Ajleibs%2Fcleanup_if_forked/docs) - [Examples preview](https://rerun.io/preview/pr%3Ajleibs%2Fcleanup_if_forked/examples)

jleibs added 🪳 bug Something isn't working 🐧 linux Linux-specific problems labels Apr 19, 2023

jleibs added a commit that referenced this issue Apr 19, 2023

Always spawn instead of fork (See: #1921)

9ec8de1

jleibs mentioned this issue Apr 19, 2023

Always spawn instead of fork (See: #1921) #1922

Merged

2 tasks

emilk added the 🐍 Python API Python logging API label Apr 19, 2023

emilk added the user-request This is a pressing issue for one of our users label Jun 27, 2023

jleibs mentioned this issue Jul 11, 2023

Cleanup internal data-structures when process has been forked #2676

Merged

3 tasks

jleibs self-assigned this Jul 11, 2023

jleibs closed this as completed in #2676 Jul 12, 2023

Wumpf mentioned this issue Jul 20, 2023

Data logged from a forked child process does not show up in the viewer. #2767

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiprocessing with start-method "fork" results in hang on shutdown #1921

Multiprocessing with start-method "fork" results in hang on shutdown #1921

jleibs commented Apr 19, 2023

teh-cmc commented May 9, 2023

emilk commented Jun 14, 2023

Multiprocessing with start-method "fork" results in hang on shutdown #1921

Multiprocessing with start-method "fork" results in hang on shutdown #1921

Comments

jleibs commented Apr 19, 2023

teh-cmc commented May 9, 2023

emilk commented Jun 14, 2023