Skip to content

Optimize device_select, device_partition for MI3xx multi-stream performance#1012

Closed
stanleytsang-amd wants to merge 4 commits into
developfrom
fix-rocprim-partition-gfx942-rocm7.0
Closed

Optimize device_select, device_partition for MI3xx multi-stream performance#1012
stanleytsang-amd wants to merge 4 commits into
developfrom
fix-rocprim-partition-gfx942-rocm7.0

Conversation

@stanleytsang-amd
Copy link
Copy Markdown
Contributor

Credit goes to @Naraenda and @NB4444 for writing the original fix for 6.4.

Under certain conditions, device_select and device_partition can experience slowdowns on MI3xx GPUs when running on multiple streams. This fix mitigates the slowdown by utilizing atomic counters instead of flat block id's to assign work. To preserve performance when the user knows in advance that multiple streams are not being used, or on non MI3xx-based architectures, this optimization is opt-in only, via template argument.

@stanleytsang-amd
Copy link
Copy Markdown
Contributor Author

Closing in favour of #1041

stanleytsang-amd added a commit that referenced this pull request Aug 2, 2025
…rmance (#1041)

Closed #1012 because branch name was of wrong format.

Credit goes to @Naraenda and @NB4444 for writing the original fix for
6.4.

Under certain conditions, device_select and device_partition can
experience slowdowns on MI3xx GPUs when running on multiple streams.
This fix mitigates the slowdown by utilizing atomic counters instead of
flat block id's to assign work. To preserve performance when the user
knows in advance that multiple streams are not being used, or on non
MI3xx-based architectures, this optimization is opt-in only, via
template argument.
@stanleytsang-amd stanleytsang-amd deleted the fix-rocprim-partition-gfx942-rocm7.0 branch August 2, 2025 01:55
stanleytsang-amd added a commit that referenced this pull request Aug 2, 2025
…rmance (#1041)

Closed #1012 because branch name was of wrong format.

Credit goes to @Naraenda and @NB4444 for writing the original fix for
6.4.

Under certain conditions, device_select and device_partition can
experience slowdowns on MI3xx GPUs when running on multiple streams.
This fix mitigates the slowdown by utilizing atomic counters instead of
flat block id's to assign work. To preserve performance when the user
knows in advance that multiple streams are not being used, or on non
MI3xx-based architectures, this optimization is opt-in only, via
template argument.
xiaohuguo2023 pushed a commit to xiaohuguo2023/rocm-libraries that referenced this pull request Aug 3, 2025
…uilds (ROCm#1012)

Single arch compile enablement

[ROCm/rocSPARSEcommit: 72e6161]
Swathi9494 pushed a commit that referenced this pull request Aug 5, 2025
…rmance (#1041)

Closed #1012 because branch name was of wrong format.

Credit goes to @Naraenda and @NB4444 for writing the original fix for
6.4.

Under certain conditions, device_select and device_partition can
experience slowdowns on MI3xx GPUs when running on multiple streams.
This fix mitigates the slowdown by utilizing atomic counters instead of
flat block id's to assign work. To preserve performance when the user
knows in advance that multiple streams are not being used, or on non
MI3xx-based architectures, this optimization is opt-in only, via
template argument.
assistant-librarian Bot pushed a commit that referenced this pull request Aug 21, 2025
This PR adds a missing header file that is required to compile rocsolver
in debug mode.
ammallya pushed a commit that referenced this pull request Sep 18, 2025
This PR adds a missing header file that is required to compile rocsolver
in debug mode.

[ROCm/rocSOLVER commit: 1b07f97]
ammallya pushed a commit that referenced this pull request Sep 26, 2025
This PR adds a missing header file that is required to compile rocsolver
in debug mode.

(cherry picked from commit 1b07f97)
ammallya pushed a commit that referenced this pull request Sep 26, 2025
This PR adds a missing header file that is required to compile rocsolver
in debug mode.

(cherry picked from commit 1b07f97)


[ROCm/rocSOLVER commit: 99cb486]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant