-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[r2.18-rocm-enhanced] Use faster atomics on ROCM; Enable kernel argument preloading #2784
base: r2.18-rocm-enhanced
Are you sure you want to change the base?
Conversation
@@ -783,7 +822,7 @@ CREATE_CUDA_DEVICE_FUNCTION_ALIAS(GpuAtomicAdd, CudaAtomicAdd); | |||
// GpuAtomicSub | |||
template <typename T, typename U> | |||
__device__ detail::ToTypeIfConvertible<U, T> GpuAtomicSub(T* ptr, U value) { | |||
return atomicSub(ptr, value); | |||
return atomicSub(detail::AddressSpaceHint<1>(ptr), value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may I ask why atomicAdd(detail::AddressSpaceHint<3>(dst), val);
is using shared memory while atomicSub(detail::AddressSpaceHint<1>(ptr), value);
is using global memory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AddressSpaceHint<3> is only for GpuAtomicAddShared that is used by rocm specific kernels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there is no GpuAtomicSubShared
, which would use addr space 3 if it existed. I'm not sure why it doesn't exist since sub just calls add anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why AddressSpaceHint<3> is not used by atomicSub?
atomicSub is rocm specific kernel as well I think? https://rocm.docs.amd.com/projects/HIP/en/docs-5.7.0/reference/kernel_language.html#atomic-functions
int atomicSub(int* address, int val)
21438d5
to
891c0d2
Compare
891c0d2
to
9174f23
Compare
No description provided.