This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
Workaround problem with fusion in CUDA 9 #17028
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Fixes #17020
The problem comes from the bug in how NVRTC in CUDA 9 handles the
default-device
flag. That flag is supposed to mark all the functions in the file as__device__
functions, but it should leave the functions decorated differently (like kernels decorated with__global__
) alone. This is the behavior in CUDA 10+. In CUDA 9, however, this__device__
attribute is applied to every function (including kernels), which is incompatible with__launch_bounds__()
attribute that we use for kernels.This PR removes the usage of
default-device
flag for NVRTC compilation and instead manually decorates all the required functions as__device__