[torch] Adjust env vars used for builds with aotriton enabled. by ScottTodd · Pull Request #1432 · ROCm/TheRock

ScottTodd · 2025-09-09T17:30:01Z

Motivation

Progress on #1040, getting closer to enabling aotriton in PyTorch on Windows.

Technical Details

This will supersede #1409 and is dependent on pytorch/pytorch#162330.

The UTF8 change I believe helps with warnings about logs for copying files with unicode characters in their names:

Message: '%s %s -> %s'
Arguments: ('copying', 'torch\\lib\\aotriton.images\\amd-gfx11xx\\flash\\bwd_kernel_dq\\FONLY__\uff0afp32@16_48_0_T_T_1___gfx11xx.aks2', 'build\\lib.win-amd64-cpython-312\\torch\\lib\\aotriton.images\\amd-gfx11xx\\flash\\bwd_kernel_dq')
--- Logging error ---
Traceback (most recent call last):
  File "C:\Users\Nod-Shark16\AppData\Local\Programs\Python\Python312\Lib\logging\__init__.py", line 1163, in emit
    stream.write(msg + self.terminator)
  File "C:\Users\Nod-Shark16\AppData\Local\Programs\Python\Python312\Lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'charmap' codec can't encode character '\uff0a' in position 73: character maps to <undefined>
Call stack:
  File "D:\b\pytorch_main\setup.py", line 1785, in <module>
    main()
  File "D:\b\pytorch_main\setup.py", line 1766, in main
    setup(
  File "D:\projects\TheRock\external-builds\pytorch\3.12.venv\Lib\site-packages\setuptools\__init__.py", line 117, in setup
    return distutils.core.setup(**attrs)

Test Plan

Tested with local builds on Windows with and without --enable-pytorch-flash-attention-windows.

Test Result

Builds succeeded, ComfyUI generated images on my gfx1100 GPU (needed TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1 for aotriton on that GPU).

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

ScottTodd · 2025-09-09T18:00:38Z

I think this is fine to merge without waiting for the upstream PR.

Nem404 · 2025-09-09T19:36:32Z

I think this is fine to merge without waiting for the upstream PR.

I hope Jeff merges pytorch/pytorch#162330 before tomorrow's CI run. Can’t wait to try AOTriton’s perf boost

ScottTodd · 2025-09-09T19:52:19Z

We'll still need to flip the --enable-pytorch-flash-attention-windows flag here for nightly releases to get aotriton:

TheRock/.github/workflows/build_windows_pytorch_wheels.yml

Lines 150 to 167 in ec9e595

    
                 - name: Build PyTorch Wheels 
        
                   id: build-pytorch-wheels 
        
                   # Using 'cmd' here is load bearing! There are configuration issues when 
        
                   # run under 'bash': https://github.com/ROCm/TheRock/issues/827#issuecomment-3025858800 
        
                   shell: cmd 
        
                   run: | 
        
                     echo "Building PyTorch wheels for ${{ inputs.amdgpu_family }}" 
        
                     python ./external-builds/pytorch/build_prod_wheels.py ^ 
        
                       build ^ 
        
                       --install-rocm ^ 
        
                       --index-url "${{ inputs.cloudfront_url }}/${{ inputs.amdgpu_family }}/" ^ 
        
                       --pytorch-dir ${{ env.CHECKOUT_ROOT }}/torch ^ 
        
                       --pytorch-audio-dir ${{ env.CHECKOUT_ROOT }}/audio ^ 
        
                       --pytorch-vision-dir ${{ env.CHECKOUT_ROOT }}/vision ^ 
        
                       --clean ^ 
        
                       --output-dir ${{ env.PACKAGE_DIST_DIR }} ^ 
        
                       ${{ env.optional_build_prod_arguments }} 
        
                     python ./build_tools/github_actions/write_torch_versions.py --dist-dir ${{ env.PACKAGE_DIST_DIR }}

I don't have any objections to at least trying that. I'll post a PR. Might not get around to testing / reviewing / etc. before tonight's release build, depending on when the pytorch PR is reviewed.

Nem404 · 2025-09-09T20:08:41Z

I don't have any objections to at least trying that. I'll post a PR. Might not get around to testing / reviewing / etc. before tonight's release build, depending on when the pytorch PR is reviewed.

Not a problem if it takes 1 or 2 more days. But up until now, aotriton was just out there, and no one knew when it would be included in the wheels. I can't believe we're this close to closing #1040 as solved :)

…#1437) ## Motivation Fixes #1040, enabling aotriton for flash attention in pytorch (if it works). This is expected to improve performance in workloads like ComfyUI image generation by upwards of 60% (e.g. 12.6 it/s to 20.0 it/s). ## Technical Details Follow-up to #1432 and depends on pytorch/pytorch#162330. Note that support is experimental for some GPUs like gfx1100, so the `TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1` environment variable may be needed to try aotriton on those systems. ## Test Plan Trigger either https://github.com/ROCm/TheRock/actions/workflows/build_windows_pytorch_wheels.yml or https://github.com/ROCm/TheRock/actions/workflows/release_windows_pytorch_wheels.yml across the matrix of GPU families once that PyTorch PR is merged. We're still going to need automated tests and documentation for this. I'd like numerics tests running somewhere and documentation that shows how to check which pytorch features are enabled in the wheels that a user installs. ## Test Result Test runs: * https://github.com/ROCm/TheRock/actions/runs/17660396787 using this branch and `7.0.0rc20250908` for gfx110X-dgpu * ~~https://github.com/ROCm/TheRock/actions/runs/17660456285 using the branch and `7.0.0rc20250908` for gfx1151~~ * https://github.com/ROCm/TheRock/actions/runs/17662170140 using the branch and `7.0.0rc20250908` for gfx1151 * Tests not running should be fixed with #1469 (may need to retrigger to pick up fixes for flaky checkouts) ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

[torch] Adjust env vars used for builds with aotriton enabled.

f687d51

ScottTodd requested a review from jammm September 9, 2025 17:30

github-project-automation Bot added this to TheRock Triage Sep 9, 2025

github-project-automation Bot moved this to TODO in TheRock Triage Sep 9, 2025

jammm approved these changes Sep 9, 2025

View reviewed changes

ScottTodd merged commit 202084d into ROCm:main Sep 9, 2025
5 checks passed

ScottTodd deleted the torch-windows-aotriton-env-vars branch September 9, 2025 18:00

github-project-automation Bot moved this from TODO to Done in TheRock Triage Sep 9, 2025

ScottTodd mentioned this pull request Sep 9, 2025

[torch] Flip --enable-pytorch-flash-attention-windows for releases. #1437

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[torch] Adjust env vars used for builds with aotriton enabled.#1432

[torch] Adjust env vars used for builds with aotriton enabled.#1432
ScottTodd merged 1 commit into
ROCm:mainfrom
ScottTodd:torch-windows-aotriton-env-vars

ScottTodd commented Sep 9, 2025

Uh oh!

ScottTodd commented Sep 9, 2025

Uh oh!

Uh oh!

Nem404 commented Sep 9, 2025

Uh oh!

ScottTodd commented Sep 9, 2025

Uh oh!

Nem404 commented Sep 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ScottTodd commented Sep 9, 2025

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

ScottTodd commented Sep 9, 2025

Uh oh!

Uh oh!

Nem404 commented Sep 9, 2025

Uh oh!

ScottTodd commented Sep 9, 2025

Uh oh!

Nem404 commented Sep 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants