Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
3d94709
Added gating changes to windows pytorch wheels
aravind-ravi1206 Sep 3, 2025
004de77
Fixed indentation
aravind-ravi1206 Sep 3, 2025
38a4b24
Resolved PR comments and added promotion script to nightly based on p…
aravind-ravi1206 Sep 8, 2025
ed5529e
Resolved PR comments and added promotion script to nightly based on p…
aravind-ravi1206 Sep 8, 2025
0f4ac03
Merge branch 'main' into users/arravikum/gating_windows_torch_builds
araravik-psd Sep 8, 2025
053d66e
Added gating changes to windows pytorch wheels
aravind-ravi1206 Sep 3, 2025
ac31c15
Fixed indentation
aravind-ravi1206 Sep 3, 2025
ade5f05
Resolved PR comments and added promotion script to nightly based on p…
aravind-ravi1206 Sep 8, 2025
1d7d806
Resolved PR comments and added promotion script to nightly based on p…
aravind-ravi1206 Sep 8, 2025
9c581ec
using ^ instead of \ for next line break in aws scp cmd for windows
aravind-ravi1206 Sep 8, 2025
a7acfe0
Merge branch 'users/arravikum/gating_windows_torch_builds' of github.…
aravind-ravi1206 Sep 8, 2025
834eaeb
Removing redundant upload to s3
aravind-ravi1206 Sep 9, 2025
54d0e3b
Added torchvision and torchaudio versions to job output
aravind-ravi1206 Sep 9, 2025
50899cb
Add python to cp_version script to get cp_version
aravind-ravi1206 Sep 9, 2025
a2d44e8
Adding doc string for promote script
aravind-ravi1206 Sep 9, 2025
28386af
Corrected formatting in readme file
aravind-ravi1206 Sep 9, 2025
f87cdb4
Documentation changes for Readme file and cp version script
aravind-ravi1206 Sep 9, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
100 changes: 93 additions & 7 deletions .github/workflows/build_windows_pytorch_wheels.yml
Comment thread
araravik-psd marked this conversation as resolved.
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,18 @@ on:
description: S3 subdirectory, not including the GPU-family
required: true
type: string
s3_staging_subdir:
description: S3 staging subdirectory, not including the GPU-family
required: true
type: string
cloudfront_url:
description: CloudFront URL pointing to Python index
required: true
type: string
cloudfront_staging_url:
description: CloudFront base URL pointing to staging Python index
required: true
type: string
rocm_version:
description: ROCm version to pip install
type: string
Expand All @@ -47,10 +55,18 @@ on:
description: S3 subdirectory, not including the GPU-family
type: string
default: "v2"
s3_staging_subdir:
description: S3 staging subdirectory, not including the GPU-family
type: string
default: "v2-staging"
cloudfront_url:
description: CloudFront base URL pointing to Python index
type: string
default: "https://d25kgig7rdsyks.cloudfront.net/v2"
cloudfront_staging_url:
description: CloudFront base URL pointing to staging Python index
type: string
default: "https://d25kgig7rdsyks.cloudfront.net/v2-staging"
rocm_version:
description: ROCm version to pip install
type: string
Expand Down Expand Up @@ -154,26 +170,24 @@ jobs:
# run: |
# python external-builds/pytorch/sanity_check_wheel.py ${{ env.PACKAGE_DIST_DIR }}

- name: Upload wheels to S3
- name: Upload wheels to S3 staging
if: ${{ github.repository_owner == 'ROCm' }}
# Using 'cmd' here since PACKAGE_DIST_DIR uses \ in paths instead of /
shell: cmd
run: |
aws s3 cp ${{ env.PACKAGE_DIST_DIR }}/ ^
s3://${{ env.S3_BUCKET_PY }}/${{ inputs.s3_subdir }}/${{ inputs.amdgpu_family }}/ ^
aws s3 cp ${{ env.PACKAGE_DIST_DIR }}/ s3://${{ env.S3_BUCKET_PY }}/${{ inputs.s3_staging_subdir }}/${{ inputs.amdgpu_family }}/ \
--recursive --exclude "*" --include "*.whl"
Comment thread
araravik-psd marked this conversation as resolved.

- name: (Re-)Generate Python package release index
- name: (Re-)Generate Python package release index for staging
if: ${{ github.repository_owner == 'ROCm' }}
run: |
pip install boto3 packaging
python ./build_tools/third_party/s3_management/manage.py ${{ inputs.s3_subdir }}/${{ inputs.amdgpu_family }}
python ./build_tools/third_party/s3_management/manage.py ${{ inputs.s3_staging_subdir }}/${{ inputs.amdgpu_family }}

generate_target_to_run:
name: Generate target_to_run
runs-on: ubuntu-24.04
outputs:
test_runs_on: ${{ steps.configure.outputs.test-runs-on }}
bypass_tests_for_releases: ${{ steps.configure.outputs.bypass_tests_for_releases }}
steps:
- name: Checking out repository
uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
Expand All @@ -196,3 +210,75 @@ jobs:
cloudfront_url: ${{ inputs.cloudfront_url }}
python_version: ${{ inputs.python_version }}
torch_version: ${{ needs.build_pytorch_wheels.outputs.torch_version }}

upload_pytorch_wheels:
name: Release PyTorch Wheels to S3
needs: [build_pytorch_wheels, generate_target_to_run, test_pytorch_wheels]
if: always()
runs-on: ubuntu-24.04
env:
S3_BUCKET_PY: "therock-${{ inputs.release_type }}-python"
CP_VERSION: "${{ needs.build_pytorch_wheels.outputs.cp_version }}"
TORCH_VERSION: "${{ needs.build_pytorch_wheels.outputs.torch_version }}"
TORCHAUDIO_VERSION: "${{ needs.build_pytorch_wheels.outputs.torchaudio_version }}"
TORCHVISION_VERSION: "${{ needs.build_pytorch_wheels.outputs.torchvision_version }}"
TRITON_VERSION: "${{ needs.build_pytorch_wheels.outputs.triton_version }}"
Comment thread
araravik-psd marked this conversation as resolved.
Outdated

steps:
- name: Checkout
uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0

- name: Configure AWS Credentials
if: always()
uses: aws-actions/configure-aws-credentials@7474bc4690e29a8392af63c5b98e7449536d5c3a # v4.3.1
with:
aws-region: us-east-2
role-to-assume: arn:aws:iam::692859939525:role/therock-${{ inputs.release_type }}-releases


- name: Determine upload flag
env:
BUILD_RESULT: ${{ needs.build_pytorch_wheels.result }}
TEST_RESULT: ${{ needs.test_pytorch_wheels.result }}
TEST_RUNS_ON: ${{ needs.generate_target_to_run.outputs.test_runs_on }}
BYPASS_TESTS_FOR_RELEASES: ${{ needs.generate_target_to_run.outputs.bypass_tests_for_releases }}
run: |
# 1) If the build failed → upload=false
if [[ "$BUILD_RESULT" != "success" ]]; then
echo "::warning::Build failed. Skipping upload."
echo "upload=false" >> "$GITHUB_ENV"

# 2) Else if there was a test runner AND tests failed or were skipped → upload=false
elif [[ -n "$TEST_RUNS_ON" && ( "$TEST_RESULT" == "failure" || "$TEST_RESULT" == "skipped" ) ]]; then
echo "::warning::Tests failed or were skipped (runner present). Skipping upload."
echo "upload=false" >> "$GITHUB_ENV"

# 3) Else if BYPASS_TESTS_FOR_RELEASES is not set and there was no test runner → upload=false
elif [[ -z "$BYPASS_TESTS_FOR_RELEASES" && -z "$TEST_RUNS_ON" ]]; then
echo "::warning::No test runner and BYPASS_TESTS_FOR_RELEASES not set. Skipping upload."
echo "upload=false" >> "$GITHUB_ENV"

# 4) Otherwise → upload=true
else
echo "upload=true" >> "$GITHUB_ENV"
fi
Comment thread
araravik-psd marked this conversation as resolved.
Outdated

- name: Copy PyTorch wheels from staging to release S3
if: ${{ env.upload == 'true' }}
run: |
echo "Copying exact tested wheels to release S3 bucket..."
aws s3 cp \
s3://${S3_BUCKET_PY}/${{ inputs.s3_staging_subdir }}/${{ inputs.amdgpu_family }}/ \
s3://${S3_BUCKET_PY}/${{ inputs.s3_subdir }}/${{ inputs.amdgpu_family }}/ \
--recursive \
--exclude "*" \
--include "torch-${TORCH_VERSION}-${CP_VERSION}-linux_x86_64.whl" \
--include "torchaudio-${TORCHAUDIO_VERSION}-${CP_VERSION}-linux_x86_64.whl" \
--include "torchvision-${TORCHVISION_VERSION}-${CP_VERSION}-linux_x86_64.whl" \
--include "pytorch_triton_rocm-${TRITON_VERSION}-${CP_VERSION}-linux_x86_64.whl"
Comment thread
araravik-psd marked this conversation as resolved.
Outdated

- name: (Re-)Generate Python package release index
if: ${{ env.upload == 'true' }}
run: |
pip install boto3 packaging
python ./build_tools/third_party/s3_management/manage.py ${{ inputs.s3_subdir }}/${{ inputs.amdgpu_family }}
29 changes: 29 additions & 0 deletions .github/workflows/release_windows_packages.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,10 @@ on:
description: "Subdirectory to push the Python packages"
type: string
default: "v2"
s3_staging_subdir:
description: "Staging subdirectory to push the packages"
type: string
default: "v2-staging"
# Trigger manually (typically to test the workflow or manually build a release [candidate])
workflow_dispatch:
inputs:
Expand All @@ -27,6 +31,10 @@ on:
description: "Subdirectory to push the Python packages"
type: string
default: "v2"
s3_staging_subdir:
description: "Staging subdirectory to push the packages"
type: string
default: "v2-staging"
families:
description: "A comma separated list of AMD GPU families, e.g. `gfx94X,gfx103x`, or empty for the default list"
type: string
Expand Down Expand Up @@ -54,6 +62,7 @@ jobs:
release_type: ${{ env.release_type }}
package_targets: ${{ steps.configure.outputs.package_targets }}
cloudfront_url: ${{ steps.release_information.outputs.cloudfront_url }}
cloudfront_staging_url: ${{ steps.release_information.outputs.cloudfront_staging_url }}
steps:
- name: Checkout repository
uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
Expand Down Expand Up @@ -83,6 +92,7 @@ jobs:
base_version=$(jq -r '.["rocm-version"]' version.json)
echo "version=${base_version}${version_suffix}" >> $GITHUB_OUTPUT
echo "cloudfront_url=${cloudfront_base_url}/${{ env.S3_SUBDIR }}" >> $GITHUB_OUTPUT
echo "cloudfront_staging_url=${cloudfront_base_url}/${{ env.S3_STAGING_SUBDIR }}" >> $GITHUB_OUTPUT

- name: Generating package target matrix
id: configure
Expand Down Expand Up @@ -117,6 +127,7 @@ jobs:
S3_BUCKET_TAR: "therock-${{ needs.setup_metadata.outputs.release_type }}-tarball"
S3_BUCKET_PY: "therock-${{ needs.setup_metadata.outputs.release_type }}-python"
S3_SUBDIR: ${{ inputs.s3_subdir || 'v2' }}
S3_STAGING_SUBDIR: ${{ inputs.s3_staging_subdir || 'v2-staging' }}

steps:
- name: "Checking out repository"
Expand Down Expand Up @@ -242,6 +253,22 @@ jobs:
aws-region: us-east-2
role-to-assume: arn:aws:iam::692859939525:role/therock-${{ env.RELEASE_TYPE }}-releases

- name: Upload Releases to staging S3
if: ${{ github.repository_owner == 'ROCm' }}
run: |
aws s3 cp ${{ env.OUTPUT_DIR }}/packages/dist/ s3://${{ env.S3_BUCKET_PY }}/${{ env.S3_STAGING_SUBDIR }}/${{ matrix.target_bundle.amdgpu_family }}/ \
--recursive --no-follow-symlinks \
--exclude "*" \
--include "*.whl" \
--include "*.tar.gz"

- name: (Re-)Generate Python package release index for staging
if: ${{ github.repository_owner == 'ROCm' }}
run: |
pip install boto3 packaging
python ./build_tools/third_party/s3_management/manage.py ${{ env.S3_STAGING_SUBDIR }}/${{ matrix.target_bundle.amdgpu_family }}

## TODO: Restrict uploading to the non-staging S3 directory until sanity checks and all validation tests have successfully passed.
- name: Upload Releases to S3
if: ${{ github.repository_owner == 'ROCm' }}
run: |
Expand Down Expand Up @@ -271,7 +298,9 @@ jobs:
{ "amdgpu_family": "${{ matrix.target_bundle.amdgpu_family }}",
"release_type": "${{ env.RELEASE_TYPE }}",
"s3_subdir": "${{ env.S3_SUBDIR }}",
"s3_staging_subdir": "${{ env.S3_STAGING_SUBDIR }}",
"cloudfront_url": "${{ needs.setup_metadata.outputs.cloudfront_url }}",
"cloudfront_staging_url": "${{ needs.setup_metadata.outputs.cloudfront_staging_url }}",
"rocm_version": "${{ needs.setup_metadata.outputs.version }}"
}

Expand Down
22 changes: 20 additions & 2 deletions .github/workflows/release_windows_pytorch_wheels.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,18 @@ on:
description: S3 subdirectory, not including the GPU-family
type: string
default: "v2"
s3_staging_subdir:
description: Staging subdirectory to push the wheels for test
type: string
default: "v2-staging"
cloudfront_url:
description: CloudFront URL pointing to Python index
type: string
default: "https://d25kgig7rdsyks.cloudfront.net/v2"
default: "https://rocm.nightlies.amd.com/v2"
Comment thread
araravik-psd marked this conversation as resolved.
Outdated
cloudfront_staging_url:
description: CloudFront base URL pointing to staging Python index
required: true
type: string
rocm_version:
description: ROCm version to pip install
type: string
Expand All @@ -40,10 +48,18 @@ on:
description: S3 subdirectory, not including the GPU-family
type: string
default: "v2"
s3_staging_subdir:
description: "Staging subdirectory to push the wheels for test"
type: string
default: "v2-staging"
cloudfront_url:
description: CloudFront URL pointing to Python index
type: string
default: "https://d25kgig7rdsyks.cloudfront.net/v2"
default: "https://rocm.nightlies.amd.com/v2"
Comment thread
araravik-psd marked this conversation as resolved.
Outdated
cloudfront_staging_url:
description: CloudFront base URL pointing to staging Python index
type: string
default: "https://rocm.nightlies.amd.com/v2-staging"
Comment thread
araravik-psd marked this conversation as resolved.
Outdated
rocm_version:
description: ROCm version to pip install
type: string
Expand All @@ -66,5 +82,7 @@ jobs:
python_version: ${{ matrix.python_version }}
release_type: ${{ inputs.release_type }}
s3_subdir: ${{ inputs.s3_subdir }}
s3_staging_subdir: ${{ inputs.s3_staging_subdir }}
cloudfront_url: ${{ inputs.cloudfront_url }}
cloudfront_staging_url: ${{ inputs.cloudfront_staging_url }}
rocm_version: ${{ inputs.rocm_version }}
11 changes: 11 additions & 0 deletions external-builds/pytorch/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,17 @@ mix/match build steps.

## Running/testing PyTorch

## Gating releases with Pytorch tests

With passing builds we upload Pytorch, TorchVisual, TorchAudio and Triton wheels to "v2-staging" s3 bucket
https://rocm.nightlies.amd.com/<v2-staging>/<gfx110X-dgpu>/
Comment thread
araravik-psd marked this conversation as resolved.
Outdated

Only with passing Torch tests we promote passed wheels to release s3 bucket
https://rocm.nightlies.amd.com/<v2>/<gfx110X-dgpu>/

If no runner is available: Promotion is blocked by default. Set bypass_tests_for_releases=true only for exceptional cases under amdgpu_family_matrix.py.
(/build_tools/github_actions/amdgpu_family_matrix.py)
Comment thread
araravik-psd marked this conversation as resolved.
Outdated

### Running ROCm and PyTorch sanity checks

The simplest tests for a working PyTorch with ROCm install are:
Expand Down
Loading