-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Fix omp fork race conditions and assert issue #17039
Conversation
I actually also did change (removed from this PR in order to keep it simple) that stops forcing the creation of the engine during static init and the fork handlers, if it is desired. This has a couple of effects including reducing the overhead of resources and threads created simply by the library being loaded, which then, in turn, causes a lot of thread churn (Stop(), Start()) when forking. Even if hey never use mxnet, we spin up a lot of stuff unnecessarily. It actually gets lot faster on startp when this is done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix! LGTM!
The change sounds useful. Do you plan to commit the change? |
If I have time soon |
* Upgrade 3rdparty/openmp to release_90 version (#17012) Fixes #10856 * Fix omp assert issue (#17039) * Disable OpenMP offloading support for 3rdparty/openmp (#17098) * Disable OpenMP offloading support for 3rdparty/openmp OpenMP offloading was introduced some time during the past two years and is enabled by default. With upgrading 3rdparty/openmp in #17012 it was made part of the MXNet CMake build. But we don't use OpenMP offloading and the Cuda target in the llvm OpenMP Offloading build is broken in our setting. * Update CMake on CI * Fix reshape interoperability test (#17155) * fix reshape interoperability test * fix for scipy import Co-authored-by: Chris Olivier <[email protected]> Co-authored-by: Hao Jin <[email protected]>
Description
Fix for: #14979
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Comments
When forking, sometimes mxnet can hook its fork handler before openmp's, depending upon static startup order as well as variables such as whether OMP_NUM_THREADS is set. This is mitigated by forcing an OMP call before mxnet hooks a fork handler as well as first-thing after forking as the child. This avoids two scenarios:
A considerable amount of multithreading is done by mxnext both during static init as ell as in atfork() handlers, and we may wish to change this over time. Due to the non-deterministic startup order of static init (and thus sometimes fork handling), it can cause unpredictable results.
Build omp in release mode to stop harmless assert. Comment notes that assert is harmless and represents a variable that fails to get set to null in their atfork_child handler (they set other similar variables to to null). The variable is set to NULL in the next line of code.