This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR addresses a problem noticed when reviewing issue #16572, and should fix the issue as well, as will be verified by @haojin2 (thanks!).
Recent PR #16391 introduced a cudaEvent to solve a race condition in the cuDNN implementation of RNNOp under some conditions. If the MXNet framework was compiled to use CUDA/CUDNN, this cudaEvent would be 'created' under all scenarios, including non-GPU RNNOps and on systems with no GPU present. The cudaEventCreateWithFlags() call however cannot be made on systems with no GPU present.
This PR makes the cudaEventCreateWithFlags() lazily, only when the event is first used (so necessarily then on a system with a GPU). Further, the thread that creates the event will have the GPU context set properly for any later calls to cudaEventRecord(). It is likely the case that on a multi-GPU setting, the main Python thread would have the context set improperly for later use of the event on an arbitrary GPU (so the issue mentioned).
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments