[rl] add import torch to provisioner bootstrap to avoid concurrent dlopen …#3220
Merged
shuhuayu merged 1 commit intopytorch:mainfrom May 5, 2026
Merged
[rl] add import torch to provisioner bootstrap to avoid concurrent dlopen …#3220shuhuayu merged 1 commit intopytorch:mainfrom
shuhuayu merged 1 commit intopytorch:mainfrom
Conversation
…race in monarch sub-processes
tianyu-l
approved these changes
May 5, 2026
Contributor
Contributor
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When Monarch spawns sub-processes and multiple actor threads begin unpickling messages concurrently, they all try to import torch at the same time, causing a race condition in
dlopenoftorch._C.so. This results in the misleading errortorch._C is not a package, even though the import works fine when done sequentially. The fix is toimport torchin the Provisioner's bootstrap function, which runs once per sub-process before any threading starts, ensuringtorch._C.sois fully loaded before concurrent unpickling begins. It's unclear whether this is a PyTorch bug (concurrent dlopen should be thread-safe) or a Monarch bug (imports during unpickling should be serialized), so we've added a TODO to remove the workaround once the upstream fix lands.