Skip to content

Add support for gfx900 cards#3564

Merged
lucbruni-amd merged 9 commits into
mainfrom
lb/gfx900-support
Mar 11, 2026
Merged

Add support for gfx900 cards#3564
lucbruni-amd merged 9 commits into
mainfrom
lb/gfx900-support

Conversation

@lucbruni-amd
Copy link
Copy Markdown
Contributor

@lucbruni-amd lucbruni-amd commented Feb 23, 2026

Motivation

Resolves #2737

Technical Details

Adds a therock_add_amdgpu_target call for gfx900 in the gfx90X family block. The target is registered like the other gfx90X targets (gfx906, gfx908, gfx90a), so it is included in THEROCK_AMDGPU_TARGETS and the corresponding family lists, and the same per-project exclusions are respected when building for gfx900.

Test Plan

Specify gfx90X-all instead of gfx90X-dcgpu in nightly build workflows to include this arch in the resulting build. Label this PR accordingly (the GitHub label gets processed and is truncated to hit the entry here (as per ci_behavior_manipulation.md). Should observe passing tests, and failures only when unrelated.

Test Result

https://github.com/ROCm/TheRock/actions/runs/22457555142/job/65043306378?pr=3564

Submission Checklist

[x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Copy link
Copy Markdown
Contributor

@geomin12 geomin12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code looks good! looks like it built too which is awesome, i think the failing sanity check is due to an infra test machine, which we will look into

one thing though: please add a good description in the PR :) and tag that gfx900 issue as well

@lucbruni-amd lucbruni-amd marked this pull request as ready for review February 24, 2026 15:40
Copy link
Copy Markdown
Contributor

@geomin12 geomin12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, the sanity check is an infra issue and unrelated

thanks for this work :)

i would also recommend getting a build folk reviewing as well

Copy link
Copy Markdown
Contributor

@HereThereBeDragons HereThereBeDragons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm other than the comment.

and do you have any gfx900 to test if it actually runs?

"""

#############################################################################################
# NOTE: when doing changes here, also check that they are done in new_amdgpu_family_matrix.py
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please note this message :)
i recommend adding there a new section "all" to it and not just rename "dcgpu"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a new section in new_amdgpu_family_matrix.py.

I don't have a gfx900 on hand, but perhaps the issue reporters @oldschoola and @GreenShadows could check if they encounter any issues with the build?

Copy link
Copy Markdown

@IMbackK IMbackK Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since gfx906 is a strict superset of gfx900 you could just run the sanity checks on a gfx906 machine with the override and be very certain that its will be fine on gfx900 too, linux kernel level bugs excepted.

@HereThereBeDragons
Copy link
Copy Markdown
Contributor

you need to rework your pr due to this: #2869
you can leave the new_amdgpu_matrix.py away. i will take care of it.

@lucbruni-amd
Copy link
Copy Markdown
Contributor Author

lucbruni-amd commented Feb 26, 2026

@HereThereBeDragons reworked this PR in light of #2869 landing.

EDIT: Also reverted my changes to new_amdgpu_family_matrix.py.

Copy link
Copy Markdown
Contributor

@HereThereBeDragons HereThereBeDragons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as this is now a separate gpu: you will also need to adjust all those workflows listed in #2869 and the jax workflow from #3633

Copy link
Copy Markdown
Contributor

@HereThereBeDragons HereThereBeDragons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm
but please run a manual ci run to build gfx900 before merging.

Copy link
Copy Markdown
Contributor

@geomin12 geomin12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marbre marbre removed their request for review March 9, 2026 23:31
@lucbruni-amd
Copy link
Copy Markdown
Contributor Author

lucbruni-amd commented Mar 11, 2026

Resolved a merge conflict as a result of #3821 landing. Also removed libhipcxx exclusion (see #2946).

@lucbruni-amd lucbruni-amd merged commit ef86d4e into main Mar 11, 2026
14 checks passed
@lucbruni-amd lucbruni-amd deleted the lb/gfx900-support branch March 11, 2026 18:08
@github-project-automation github-project-automation Bot moved this from TODO to Done in TheRock Triage Mar 11, 2026
@oldschoola
Copy link
Copy Markdown

The roadmap should be also updated no?
https://github.com/ROCm/TheRock/blob/main/ROADMAP.md

@lucbruni-amd
Copy link
Copy Markdown
Contributor Author

Thanks @oldschoola. Opened a PR for that mentioned above.

chiranjeevipattigidi added a commit that referenced this pull request Mar 17, 2026
## Motivation
gfx900 support was recently introduced into the system via
#3564. However, AOTriton currently
does not support gfx900, leading to PyTorch build failures when this
architecture is detected.
eg failure log:
https://github.com/ROCm/TheRock/actions/runs/23181751945/job/67356031342

Previously, PyTorch builds were failing due to missing package
requirements (see: #3988). Those
requirements were later uploaded manually:
https://github.com/ROCm/TheRock/tree/main/build_tools/third_party/s3_management#adding-a-new-package-dependency
To ensure stable builds, this PR adds gfx900 to the
AOTRITON_UNSUPPORTED_ARCHS list within the PyTorch build script,
preventing AOTriton from attempting compilation on unsupported hardware.

## Test Plan
Linux Rocm Run: https://github.com/ROCm/TheRock/actions/runs/23189631595
pytorch: https://github.com/ROCm/TheRock/actions/runs/23198946201

## Test Result:

gfx900 pytorch builds passed.

## Submission Checklist

- [X] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
lucbruni-amd added a commit that referenced this pull request Apr 13, 2026
## Motivation

Update `ROADMAP.md` to reflect recently added support.

## Technical Details

`gfx103X-all` builds passing for Linux/Windows:
#3763 (Pytorch failing until
ROCm/rocm-libraries#5141 lands)

`gfx900` builds passing: #3564

`gfx90c` builds awaiting ROCm/rocm-libraries#5282 to go green

## Test Plan

`gfx90c` builds to be tested
(#3818)

## Test Result

N/A

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
@i-chaochen
Copy link
Copy Markdown

i-chaochen commented May 8, 2026

@lucbruni-amd why we still need to support gfx900/gfx906?

https://rocm.docs.amd.com/en/latest/compatibility/compatibility-matrix.html
AFAIK ROCm/LLVM is supporting since gfx908, these build wouldn't work at all.

@GreenShadows
Copy link
Copy Markdown

@lucbruni-amd why we still need to support gfx900/gfx906?

https://rocm.docs.amd.com/en/latest/compatibility/compatibility-matrix.html AFAIK ROCm/LLVM is supporting since gfx908, these build wouldn't work at all.

The GFX906 works fine. I'm surprised by these anti-consumer and conflicting suggestions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[Feature] gfx900 build support enablement

7 participants