Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[r2.18-rocm-enhanced] Use faster atomics on ROCM; Enable kernel argument preloading #2784

Open
wants to merge 2 commits into
base: r2.18-rocm-enhanced
Choose a base branch
from

Conversation

draganmladjenovic
Copy link

No description provided.

@@ -783,7 +822,7 @@ CREATE_CUDA_DEVICE_FUNCTION_ALIAS(GpuAtomicAdd, CudaAtomicAdd);
// GpuAtomicSub
template <typename T, typename U>
__device__ detail::ToTypeIfConvertible<U, T> GpuAtomicSub(T* ptr, U value) {
return atomicSub(ptr, value);
return atomicSub(detail::AddressSpaceHint<1>(ptr), value);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may I ask why atomicAdd(detail::AddressSpaceHint<3>(dst), val); is using shared memory while atomicSub(detail::AddressSpaceHint<1>(ptr), value); is using global memory?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AddressSpaceHint<3> is only for GpuAtomicAddShared that is used by rocm specific kernels.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there is no GpuAtomicSubShared, which would use addr space 3 if it existed. I'm not sure why it doesn't exist since sub just calls add anyway.

Copy link

@i-chaochen i-chaochen Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why AddressSpaceHint<3> is not used by atomicSub?

atomicSub is rocm specific kernel as well I think? https://rocm.docs.amd.com/projects/HIP/en/docs-5.7.0/reference/kernel_language.html#atomic-functions

int atomicSub(int* address, int val)

@jayfurmanek jayfurmanek changed the title R2.18 rocm enhanced bd [r2.18-rocm-enhanced] Use faster atomics on ROCM; Enable kernel argument preloading Dec 16, 2024
@amd-jianli12 amd-jianli12 force-pushed the r2.18-rocm-enhanced-bd branch from 21438d5 to 891c0d2 Compare January 8, 2025 01:16
@amd-jianli12 amd-jianli12 force-pushed the r2.18-rocm-enhanced-bd branch from 891c0d2 to 9174f23 Compare January 8, 2025 06:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants