-
Notifications
You must be signed in to change notification settings - Fork 265
Add test_rocm_wheels.yml workflow for testing ROCm Python packages #3099
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
e2fcfc4
6353ec3
56ff4f1
6553531
22a95ce
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,104 @@ | ||
| name: Test ROCm Wheels | ||
|
|
||
| on: | ||
| workflow_dispatch: | ||
| inputs: | ||
| amdgpu_family: | ||
| description: GPU family to test (e.g., gfx94X-dcgpu, gfx110X-all) | ||
| required: true | ||
| type: string | ||
| default: "gfx94X-dcgpu" | ||
| test_runs_on: | ||
| description: Runner label to use. The selected runner should have a GPU supported by amdgpu_family | ||
| required: true | ||
| type: string | ||
| default: "linux-mi325-1gpu-ossci-rocm-frac" | ||
| package_index_url: | ||
| description: Base Python package index URL (without GPU family subdir) | ||
| required: true | ||
| type: string | ||
| default: "https://rocm.nightlies.amd.com/v2" | ||
| python_version: | ||
| required: true | ||
| type: string | ||
| default: "3.12" | ||
| rocm_version: | ||
| description: ROCm version to pip install (e.g. "7.10.0a20251124") | ||
| required: true | ||
| type: string | ||
|
|
||
| workflow_call: | ||
| inputs: | ||
| amdgpu_family: | ||
| required: true | ||
| type: string | ||
| test_runs_on: | ||
| required: true | ||
| type: string | ||
| package_index_url: | ||
| required: true | ||
| type: string | ||
| python_version: | ||
| required: true | ||
| type: string | ||
| rocm_version: | ||
| required: true | ||
| type: string | ||
| repository: | ||
| description: "Repository to checkout. Otherwise, defaults to `github.repository`." | ||
| type: string | ||
| ref: | ||
| description: "Branch, tag or SHA to checkout. Defaults to the reference or SHA that triggered the workflow." | ||
| type: string | ||
|
|
||
| permissions: | ||
| contents: read | ||
|
|
||
| run-name: Test ROCm Wheels (${{ inputs.amdgpu_family }}, ${{ inputs.rocm_version }}, ${{ inputs.test_runs_on }}) | ||
|
|
||
| jobs: | ||
| test_wheels: | ||
| name: Test ROCm Wheels | ${{ inputs.amdgpu_family }} | ||
| runs-on: ${{ inputs.test_runs_on }} | ||
| container: | ||
| image: ${{ contains(inputs.test_runs_on, 'linux') && 'ghcr.io/rocm/no_rocm_image_ubuntu24_04@sha256:405945a40deaff9db90b9839c0f41d4cba4a383c1a7459b28627047bf6302a26' || null }} | ||
| options: --ipc host | ||
| --group-add video | ||
| --device /dev/kfd | ||
| --device /dev/dri | ||
| --group-add 110 | ||
| --env-file /etc/podinfo/gha-gpu-isolation-settings | ||
| --user 0:0 # Running as root, by recommendation of GitHub: https://docs.github.com/en/actions/reference/workflows-and-actions/dockerfile-support#user | ||
| defaults: | ||
| run: | ||
| shell: bash | ||
| env: | ||
| VENV_DIR: ${{ github.workspace }}/.venv | ||
|
|
||
| steps: | ||
| - name: Checkout | ||
| uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1 | ||
| with: | ||
| repository: ${{ inputs.repository || github.repository }} | ||
| ref: ${{ inputs.ref || '' }} | ||
|
|
||
| - name: Set up Python | ||
| uses: actions/setup-python@83679a892e2d95755f2dac6acb0bfd1e9ac5d548 # v6.1.0 | ||
| with: | ||
| python-version: ${{ inputs.python_version }} | ||
|
|
||
| - name: Set up virtual environment and install ROCm packages | ||
| run: | | ||
| python build_tools/setup_venv.py ${VENV_DIR} \ | ||
| --packages "rocm[libraries,devel]==${{ inputs.rocm_version }}" \ | ||
| --index-url=${{ inputs.package_index_url }} \ | ||
| --index-subdir=${{ inputs.amdgpu_family }} \ | ||
| --activate-in-future-github-actions-steps | ||
|
Comment on lines
+92
to
+96
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this needs more work as we won't have a full index for artifacts. Fine to address in a follow up though but might want to add a todo here (if it isn't tracked in the issue).
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, using this for CI / dev artifacts will need some extra work (changes to |
||
|
|
||
| - name: Show installed packages | ||
| run: | | ||
| pip freeze | ||
|
|
||
| - name: Run rocm-sdk sanity tests | ||
| run: | | ||
| rocm-sdk test | ||
|
Comment on lines
+102
to
+104
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Testing on our self-hosted runners with GPUs now:
Both Linux runs failed with Looks similar to #1877 - I thought we fixed that? Is this a real issue or is the test workflow not configured correctly?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can reproduce this locally with our test dockerfile. Will follow-up with a new issue. |
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure we really need the runner name here or if we want it, we could drop the
amdgpu_family.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copied from the style used for pytorch wheel tests:
TheRock/.github/workflows/test_pytorch_wheels.yml
Lines 64 to 68 in cf13cfd
I think it's fine to include for workflow_dispatch:

Most use of this workflow once integrated will be using workflow_call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well with the default for
test_runs_on, we would getlinux-mi325-1gpu-ossci-rocm-fracwhich already includes the GPU kind of (even though not the gfx architecture). Fine as is for me though.