Add support for gfx900 cards#3564
Conversation
geomin12
left a comment
There was a problem hiding this comment.
code looks good! looks like it built too which is awesome, i think the failing sanity check is due to an infra test machine, which we will look into
one thing though: please add a good description in the PR :) and tag that gfx900 issue as well
geomin12
left a comment
There was a problem hiding this comment.
lgtm, the sanity check is an infra issue and unrelated
thanks for this work :)
i would also recommend getting a build folk reviewing as well
HereThereBeDragons
left a comment
There was a problem hiding this comment.
lgtm other than the comment.
and do you have any gfx900 to test if it actually runs?
| """ | ||
|
|
||
| ############################################################################################# | ||
| # NOTE: when doing changes here, also check that they are done in new_amdgpu_family_matrix.py |
There was a problem hiding this comment.
please note this message :)
i recommend adding there a new section "all" to it and not just rename "dcgpu"
There was a problem hiding this comment.
Added a new section in new_amdgpu_family_matrix.py.
I don't have a gfx900 on hand, but perhaps the issue reporters @oldschoola and @GreenShadows could check if they encounter any issues with the build?
There was a problem hiding this comment.
since gfx906 is a strict superset of gfx900 you could just run the sanity checks on a gfx906 machine with the override and be very certain that its will be fine on gfx900 too, linux kernel level bugs excepted.
|
you need to rework your pr due to this: #2869 |
|
@HereThereBeDragons reworked this PR in light of #2869 landing. EDIT: Also reverted my changes to |
HereThereBeDragons
left a comment
There was a problem hiding this comment.
lgtm
but please run a manual ci run to build gfx900 before merging.
|
The roadmap should be also updated no? |
|
Thanks @oldschoola. Opened a PR for that mentioned above. |
## Motivation gfx900 support was recently introduced into the system via #3564. However, AOTriton currently does not support gfx900, leading to PyTorch build failures when this architecture is detected. eg failure log: https://github.com/ROCm/TheRock/actions/runs/23181751945/job/67356031342 Previously, PyTorch builds were failing due to missing package requirements (see: #3988). Those requirements were later uploaded manually: https://github.com/ROCm/TheRock/tree/main/build_tools/third_party/s3_management#adding-a-new-package-dependency To ensure stable builds, this PR adds gfx900 to the AOTRITON_UNSUPPORTED_ARCHS list within the PyTorch build script, preventing AOTriton from attempting compilation on unsupported hardware. ## Test Plan Linux Rocm Run: https://github.com/ROCm/TheRock/actions/runs/23189631595 pytorch: https://github.com/ROCm/TheRock/actions/runs/23198946201 ## Test Result: gfx900 pytorch builds passed. ## Submission Checklist - [X] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
## Motivation Update `ROADMAP.md` to reflect recently added support. ## Technical Details `gfx103X-all` builds passing for Linux/Windows: #3763 (Pytorch failing until ROCm/rocm-libraries#5141 lands) `gfx900` builds passing: #3564 `gfx90c` builds awaiting ROCm/rocm-libraries#5282 to go green ## Test Plan `gfx90c` builds to be tested (#3818) ## Test Result N/A ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
|
@lucbruni-amd why we still need to support gfx900/gfx906? https://rocm.docs.amd.com/en/latest/compatibility/compatibility-matrix.html |
The GFX906 works fine. I'm surprised by these anti-consumer and conflicting suggestions. |
Motivation
Resolves #2737
Technical Details
Adds a
therock_add_amdgpu_targetcall forgfx900in thegfx90Xfamily block. The target is registered like the othergfx90Xtargets (gfx906,gfx908,gfx90a), so it is included inTHEROCK_AMDGPU_TARGETSand the corresponding family lists, and the same per-project exclusions are respected when building forgfx900.Test Plan
Specify
gfx90X-allinstead ofgfx90X-dcgpuin nightly build workflows to include this arch in the resulting build. Label this PR accordingly (the GitHub label gets processed and is truncated to hit the entry here (as per ci_behavior_manipulation.md). Should observe passing tests, and failures only when unrelated.Test Result
https://github.com/ROCm/TheRock/actions/runs/22457555142/job/65043306378?pr=3564
Submission Checklist
[x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.