Skip to content

Cherry-pick hipBLASLt fix for hip::device dep.#1132

Merged
ScottTodd merged 1 commit into
mainfrom
users/scotttodd/hipblaslt-cherrypick
Jul 26, 2025
Merged

Cherry-pick hipBLASLt fix for hip::device dep.#1132
ScottTodd merged 1 commit into
mainfrom
users/scotttodd/hipblaslt-cherrypick

Conversation

@ScottTodd
Copy link
Copy Markdown
Member

@ScottTodd ScottTodd commented Jul 26, 2025

Cherry-pick of ROCm/rocm-libraries#855 for testing, with conflicts resolved manually. Patches did not apply cleanly on a roll-up, as ROCm/rocm-libraries#702 is still under review and has conflicts too.

Testing at:

@stellaraccident
Copy link
Copy Markdown
Collaborator

If ci says good, lgtm. That cherrypick has an additional assert that should now fail the CI if this ever regressed.

@ScottTodd
Copy link
Copy Markdown
Member Author

Windows PyTorch wheel builds succeeded: https://github.com/ROCm/TheRock/actions/runs/16536876692

Linux got successful builds but failed in workflow scripting code we changed recently. Looks easy enough to fix. cc @marbre @araravik-psd
https://github.com/ROCm/TheRock/actions/runs/16538063884/job/46775446523


2025-07-26T09:06:46.5231311Z Found built wheel: /__w/TheRock/TheRock/external-builds/pytorch/triton/python/dist/pytorch_triton_rocm-3.3.1-cp311-cp311-linux_x86_64.whl
2025-07-26T09:06:46.5232335Z ++ Copy /__w/TheRock/TheRock/external-builds/pytorch/triton/python/dist/pytorch_triton_rocm-3.3.1-cp311-cp311-linux_x86_64.whl -> /home/runner/_work/TheRock/TheRock/output/packages/dist

...

2025-07-26T09:06:46.5249212Z Found built wheel: /__w/TheRock/TheRock/external-builds/pytorch/pytorch/dist/torch-2.7.1+rocm7.0.0.dev0.9b07c81904cf711789f5c8c43919135d018f75a9-cp311-cp311-linux_x86_64.whl
2025-07-26T09:06:46.5250484Z ++ Copy /__w/TheRock/TheRock/external-builds/pytorch/pytorch/dist/torch-2.7.1+rocm7.0.0.dev0.9b07c81904cf711789f5c8c43919135d018f75a9-cp311-cp311-linux_x86_64.whl -> /home/runner/_work/TheRock/TheRock/output/packages/dist

..

2025-07-26T09:06:46.5260558Z Found built wheel: /__w/TheRock/TheRock/external-builds/pytorch/pytorch_audio/dist/torchaudio-2.7.1a0+rocm7.0.0.dev0.9b07c81904cf711789f5c8c43919135d018f75a9-cp311-cp311-linux_x86_64.whl
2025-07-26T09:06:46.5261942Z ++ Copy /__w/TheRock/TheRock/external-builds/pytorch/pytorch_audio/dist/torchaudio-2.7.1a0+rocm7.0.0.dev0.9b07c81904cf711789f5c8c43919135d018f75a9-cp311-cp311-linux_x86_64.whl -> /home/runner/_work/TheRock/TheRock/output/packages/dist

...

2025-07-26T09:07:29.5328198Z Found built wheel: /__w/TheRock/TheRock/external-builds/pytorch/pytorch_vision/dist/torchvision-0.22.1+rocm7.0.0.dev0.9b07c81904cf711789f5c8c43919135d018f75a9-cp311-cp311-linux_x86_64.whl
2025-07-26T09:07:29.5329833Z ++ Copy /__w/TheRock/TheRock/external-builds/pytorch/pytorch_vision/dist/torchvision-0.22.1+rocm7.0.0.dev0.9b07c81904cf711789f5c8c43919135d018f75a9-cp311-cp311-linux_x86_64.whl -> /home/runner/_work/TheRock/TheRock/output/packages/distrunner/_work/TheRock/TheRock/output/packages/dist
2025-07-26T09:07:29.5621283Z Traceback (most recent call last):
2025-07-26T09:07:29.5622160Z   File "/__w/TheRock/TheRock/./build_tools/github_actions/write_torch_version.py", line 19, in <module>
2025-07-26T09:07:29.5622746Z     main(sys.argv[1:])
2025-07-26T09:07:29.5623244Z   File "/__w/TheRock/TheRock/./build_tools/github_actions/write_torch_version.py", line 14, in main
2025-07-26T09:07:29.5623944Z     version = glob.glob("torch-*.whl", root_dir=package_dist_dir)[0].split("-")[1]
2025-07-26T09:07:29.5624426Z               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
2025-07-26T09:07:29.5624780Z IndexError: list index out of range
2025-07-26T09:07:29.5720032Z ##[error]Process completed with exit code 1.

- name: Build PyTorch Wheels
id: build-pytorch-wheels
run: |
echo "Building PyTorch wheels for ${{ inputs.amdgpu_family }}"
./external-builds/pytorch/build_prod_wheels.py \
build \
--install-rocm \
--pip-cache-dir /tmp/pipcache \
--index-url "${{ inputs.cloudfront_url }}/${{ inputs.amdgpu_family }}/" \
--clean \
--output-dir ${{ env.PACKAGE_DIST_DIR }} ${{ env.optional_build_prod_arguments }}
python ./build_tools/github_actions/write_torch_version.py
def main(argv: list[str]):
# Get the torch version from the first torch wheel in PACKAGE_DIST_DIR.
package_dist_dir = os.getenv("PACKAGE_DIST_DIR")
version = glob.glob("torch-*.whl", root_dir=package_dist_dir)[0].split("-")[1]
gha_set_output({"torch_version": version})

PACKAGE_DIST_DIR: /home/runner/_work/TheRock/TheRock/output/packages/dist

I also had some ideas for how to continue expanding on that script in #1110 (comment)

@ScottTodd ScottTodd marked this pull request as ready for review July 26, 2025 13:47
@ScottTodd ScottTodd merged commit e53f830 into main Jul 26, 2025
51 of 56 checks passed
@ScottTodd ScottTodd deleted the users/scotttodd/hipblaslt-cherrypick branch July 26, 2025 15:44
@github-project-automation github-project-automation Bot moved this from TODO to Done in TheRock Triage Jul 26, 2025
@ScottTodd
Copy link
Copy Markdown
Member Author

I'm working on fixing the script. I added more logging in https://github.com/ROCm/TheRock/tree/users/scotttodd/write-torch-versions and triggered a test job at https://github.com/ROCm/TheRock/actions/runs/16542014324/job/46784536594

Logs:

 Found built wheel: /__w/TheRock/TheRock/external-builds/pytorch/pytorch_vision/dist/torchvision-0.22.1+rocm7.0.0.dev0.9b07c81904cf711789f5c8c43919135d018f75a9-cp312-cp312-linux_x86_64.whl
++ Copy /__w/TheRock/TheRock/external-builds/pytorch/pytorch_vision/dist/torchvision-0.22.1+rocm7.0.0.dev0.9b07c81904cf711789f5c8c43919135d018f75a9-cp312-cp312-linux_x86_64.whl -> /home/runner/_work/TheRock/TheRock/output/packages/dist
Traceback (most recent call last):
Looking for wheels in '/__w/TheRock/TheRock/output/packages/dist'
  File "/__w/TheRock/TheRock/./build_tools/github_actions/write_torch_versions.py", line 104, in <module>
    main()
  File "/__w/TheRock/TheRock/./build_tools/github_actions/write_torch_versions.py", line 98, in main
    all_versions = get_all_wheel_versions(package_dist_dir)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/__w/TheRock/TheRock/./build_tools/github_actions/write_torch_versions.py", line 67, in get_all_wheel_versions
Found files in that directory:
Looking for 'torch in '/__w/TheRock/TheRock/output/packages/dist'
  WARNING: Found no 'torch' wheels matching 'torch-*.whl'
Looking for 'torchaudio in '/__w/TheRock/TheRock/output/packages/dist'
  WARNING: Found no 'torchaudio' wheels matching 'torchaudio-*.whl'
Looking for 'torchvision in '/__w/TheRock/TheRock/output/packages/dist'
  WARNING: Found no 'torchvision' wheels matching 'torchvision-*.whl'
Looking for 'pytorch_triton_rocm in '/__w/TheRock/TheRock/output/packages/dist'
  WARNING: Found no 'pytorch_triton_rocm' wheels matching 'pytorch_triton_rocm-*.whl'
    raise FileNotFoundError("Did not find torch wheel")
FileNotFoundError: Did not find torch wheel

actions/checkout#785 has some clues, looks like a difference between /home/runner/work/<REPO_NAME>/<REPO_NAME> and /__w/<REPO_NAME>/<REPO_NAME> (when using containers?)

@ScottTodd
Copy link
Copy Markdown
Member Author

Script fixes: #1133. Could scope that down a bit to make it easier to land.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants