Skip to content

fix: resolve out-of-memory in tests on rx9070#727

Merged
rocm-ci merged 3 commits into
ROCm:release/rocm-rel-6.4from
StreamHPC:test-failure-on-linux-rx9070
May 22, 2025
Merged

fix: resolve out-of-memory in tests on rx9070#727
rocm-ci merged 3 commits into
ROCm:release/rocm-rel-6.4from
StreamHPC:test-failure-on-linux-rx9070

Conversation

@Naraenda
Copy link
Copy Markdown
Member

@Naraenda Naraenda commented May 2, 2025

Work by @NB4444. This PR adds extra checks that skips tests that would allocate too much memory.

Closes #720

@Naraenda Naraenda changed the base branch from develop to release/rocm-rel-6.4 May 8, 2025 08:06
@Saiyang-Zhang Saiyang-Zhang force-pushed the test-failure-on-linux-rx9070 branch from 7b63ef0 to 53c763a Compare May 8, 2025 12:56
@Saiyang-Zhang
Copy link
Copy Markdown
Contributor

Saiyang-Zhang commented May 8, 2025

The fix was based on the new device_ptr util, which gives a clean interface for managing and testifying memories, so the related headers were included along. The feature was not planned in 6.4, but we eventually will include it since it's already included in newer versions, so we added it here.

@Naraenda Naraenda marked this pull request as ready for review May 8, 2025 14:20
Copy link
Copy Markdown
Collaborator

@stanleytsang-amd stanleytsang-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Naraenda there are build errors in this PR. I think you forgot to include the utils_device_ptr.hpp header in the unit test cpp files?

/longer_pathname_so_that_rpms_can_support_packaging_the_debug_info_for_all_os_profiles/src/rocPRIM/test/rocprim/test_device_merge.cpp:173:13: error: use of undeclared identifier 'common'
173 | common::device_ptr<key_type> d_keys_input1;
| ^
/longer_pathname_so_that_rpms_can_support_packaging_the_debug_info_for_all_os_profiles/src/rocPRIM/test/rocprim/test_device_merge.cpp:173:32: error: unexpected type name 'key_type': expected expression
173 | common::device_ptr<key_type> d_keys_input1;
| ^
/longer_pathname_so_that_rpms_can_support_packaging_the_debug_info_for_all_os_profiles/src/rocPRIM/test/rocprim/test_device_merge.cpp:173:42: error: use of undeclared identifier 'd_keys_input1'; did you mean 'keys_input1'?
173 | common::device_ptr<key_type> d_keys_input1;

@Naraenda
Copy link
Copy Markdown
Member Author

[...] there are build errors in this PR.

@Saiyang-Zhang is currently taking this over and investigating.

@Saiyang-Zhang Saiyang-Zhang force-pushed the test-failure-on-linux-rx9070 branch from 0b63a4a to 88d4f40 Compare May 19, 2025 06:43
@Saiyang-Zhang Saiyang-Zhang force-pushed the test-failure-on-linux-rx9070 branch from 88d4f40 to d7f2ad3 Compare May 19, 2025 07:18
@Saiyang-Zhang
Copy link
Copy Markdown
Contributor

Saiyang-Zhang commented May 20, 2025

device_ptr objects were not correctly used, now the occurrences are fixed and should be ready to be reviewed

@jayhawk-commits jayhawk-commits linked an issue May 22, 2025 that may be closed by this pull request
@stanleytsang-amd stanleytsang-amd self-requested a review May 22, 2025 20:00
Copy link
Copy Markdown

@rocm-ci rocm-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto Code-Review by Jenkins

@rocm-ci rocm-ci merged commit 9b1495e into ROCm:release/rocm-rel-6.4 May 22, 2025
2 checks passed
stanleytsang-amd pushed a commit to ROCm/rocm-libraries that referenced this pull request Jun 13, 2025
This change did not seem to make it in this PR
ROCm/rocPRIM#727.
assistant-librarian Bot pushed a commit that referenced this pull request Jun 13, 2025
Fix test failure with lower memory cards

This change did not seem to make it in this PR
#727.
jayhawk-commits pushed a commit to ROCm/rocm-libraries that referenced this pull request Jun 17, 2025
This change did not seem to make it in this PR
ROCm/rocPRIM#727.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Test failures on Linux RX9070 Test failures on Linux RX9070

5 participants