Enable architectures for libhipcxx (fixes #2504)#2946
Conversation
fdafc59 to
7b58281
Compare
47ffdb5 to
8609a4a
Compare
9f02139 to
38a6728
Compare
1a60720 to
268ad54
Compare
|
Todo: We need to disable the changes in amdgpu_family_matrix.py before merging. The failing gfx90a tests seem to be a node issue as other packages have similar issues (and we test with gfx90a on our dev CI which is passing all of our tests) |
There was a problem hiding this comment.
These changes need to be reverted before the merge but are okay for testing.
| hipSPARSELt # https://github.com/ROCm/TheRock/issues/2042 | ||
| composable_kernel # https://github.com/ROCm/TheRock/issues/1245 | ||
| rocWMMA # https://github.com/ROCm/TheRock/issues/1944 | ||
| libhipcxx # https://github.com/ROCm/TheRock/issues/2504 |
There was a problem hiding this comment.
[20](https://github.com/ROCm/TheRock/actions/runs/21752139280/job/63742765914?pr=2946#step:10:3025)
5: Failed Tests (25):
5: libhip++ :: cuda/float/bfloat_cos_comparison.pass.cpp
5: libhip++ :: cuda/float/bfloat_exp_comparison.pass.cpp
5: libhip++ :: cuda/float/bfloat_log_comparison.pass.cpp
5: libhip++ :: cuda/float/bfloat_sin_comparison.pass.cpp
5: libhip++ :: cuda/float/bfloat_sqrt_comparison.pass.cpp
5: libhip++ :: cuda/float/half_cos_comparison.pass.cpp
5: libhip++ :: cuda/float/half_exp_comparison.pass.cpp
5: libhip++ :: cuda/float/half_log_comparison.pass.cpp
5: libhip++ :: cuda/float/half_sin_comparison.pass.cpp
5: libhip++ :: cuda/float/half_sqrt_comparison.pass.cpp
5: libhip++ :: heterogeneous/atomic/atomic_cuda_float.pass.cpp
5: libhip++ :: heterogeneous/atomic/atomic_cuda_generic.pass.cpp
5: libhip++ :: heterogeneous/atomic/atomic_cuda_signed.pass.cpp
5: libhip++ :: heterogeneous/atomic/atomic_cuda_unsigned.pass.cpp
5: libhip++ :: heterogeneous/atomic/atomic_std_float.pass.cpp
5: libhip++ :: heterogeneous/atomic/atomic_std_generic.pass.cpp
5: libhip++ :: heterogeneous/atomic/atomic_std_signed.pass.cpp
5: libhip++ :: heterogeneous/atomic/atomic_std_unsigned.pass.cpp
5: libhip++ :: heterogeneous/atomic/flag.pass.cpp
5: libhip++ :: heterogeneous/atomic/reference_cuda.pass.cpp
5: libhip++ :: heterogeneous/atomic/reference_std.pass.cpp
5: libhip++ :: heterogeneous/optional.pass.cpp
5: libhip++ :: heterogeneous/pair.pass.cpp
5: libhip++ :: heterogeneous/tuple.pass.cpp
5: libhip++ :: heterogeneous/variant.pass.cpp
There was a problem hiding this comment.
@geomin12 @amd-justchen can you help looking into the potential node-issue? I think we should wait for a green signal and not fully rely on the local testing done by @obersteiner and team. We should make sure tests passed at least once before merging :)
There was a problem hiding this comment.
@geomin, @amd-justchen @marbre: In the logs I see errors like out of memory exceptions that I also see for example for thrust. We do not use a lot of memory in our tests (we do not have any large numeric tests but instead have only very small tests). An example for the rocThrust error https://github.com/ROCm/TheRock/actions/runs/21752139280/job/63742765864?pr=2946#step:10:19181
There was a problem hiding this comment.
Those tests are passing now.
| hipSPARSELt # https://github.com/ROCm/TheRock/issues/2042 | ||
| composable_kernel # https://github.com/ROCm/TheRock/issues/1245 | ||
| rocWMMA # https://github.com/ROCm/TheRock/issues/1944 | ||
| libhipcxx # https://github.com/ROCm/TheRock/issues/2504 |
There was a problem hiding this comment.
There are test failures for the so far tested gfx90a, see below. I assume this should be fixed before allowing to run on other gfx90X architectures?
| # Label is linux-gfx110X-gpu-rocm, fetch-gfx-targets should be ["gfx1100"] | ||
| "test-runs-on": "", | ||
| "family": "gfx110X-all", | ||
| "fetch-gfx-targets": [], |
There was a problem hiding this comment.
i'm glad the tests work :)
but let's please revert this back to original state, we lack the machines to run all tests
i would post the test logs in the PR description as proof of working, then potentially add skip-ci label as this is already proven working and no need to take CI resources
There was a problem hiding this comment.
I have posted the link to the old test into the description + dropped the commit that changed the file + added the skip-ci label.
aaff0fb to
dbae602
Compare
geomin12
left a comment
There was a problem hiding this comment.
lgtm, thanks for this update
|
I'll need to fix I removed the labels, so skip-ci will work |
marbre
left a comment
There was a problem hiding this comment.
All currently linked checks seem to have been canceled (or CI runs might be still queued due do the force push)? Code-change wise this looks good but I don't have any signal from CI for your changes. Therefore leaving it to @geomin12 and you to judge.
Dismissing as outlined in the last review comment.
Motivation
So far libhipcxx only enables a very limited number of architectures. With this PR we want to widen this support.
Test Plan
Ideally we want to test all added architectures.
Test Logs
Libhipcxx is passing all Linux architectures that we could test via TheRock:
https://github.com/ROCm/TheRock/actions/runs/22229769520?pr=2946
Submission Checklist