[torch] Adjust env vars used for builds with aotriton enabled.#1432
Conversation
|
I think this is fine to merge without waiting for the upstream PR. |
I hope Jeff merges pytorch/pytorch#162330 before tomorrow's CI run. Can’t wait to try AOTriton’s perf boost |
|
We'll still need to flip the TheRock/.github/workflows/build_windows_pytorch_wheels.yml Lines 150 to 167 in ec9e595 I don't have any objections to at least trying that. I'll post a PR. Might not get around to testing / reviewing / etc. before tonight's release build, depending on when the pytorch PR is reviewed. |
Not a problem if it takes 1 or 2 more days. But up until now, aotriton was just out there, and no one knew when it would be included in the wheels. I can't believe we're this close to closing #1040 as solved :) |
…#1437) ## Motivation Fixes #1040, enabling aotriton for flash attention in pytorch (if it works). This is expected to improve performance in workloads like ComfyUI image generation by upwards of 60% (e.g. 12.6 it/s to 20.0 it/s). ## Technical Details Follow-up to #1432 and depends on pytorch/pytorch#162330. Note that support is experimental for some GPUs like gfx1100, so the `TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1` environment variable may be needed to try aotriton on those systems. ## Test Plan Trigger either https://github.com/ROCm/TheRock/actions/workflows/build_windows_pytorch_wheels.yml or https://github.com/ROCm/TheRock/actions/workflows/release_windows_pytorch_wheels.yml across the matrix of GPU families once that PyTorch PR is merged. We're still going to need automated tests and documentation for this. I'd like numerics tests running somewhere and documentation that shows how to check which pytorch features are enabled in the wheels that a user installs. ## Test Result Test runs: * https://github.com/ROCm/TheRock/actions/runs/17660396787 using this branch and `7.0.0rc20250908` for gfx110X-dgpu * ~~https://github.com/ROCm/TheRock/actions/runs/17660456285 using the branch and `7.0.0rc20250908` for gfx1151~~ * https://github.com/ROCm/TheRock/actions/runs/17662170140 using the branch and `7.0.0rc20250908` for gfx1151 * Tests not running should be fixed with #1469 (may need to retrigger to pick up fixes for flaky checkouts) ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
Motivation
Progress on #1040, getting closer to enabling aotriton in PyTorch on Windows.
Technical Details
This will supersede #1409 and is dependent on pytorch/pytorch#162330.
The UTF8 change I believe helps with warnings about logs for copying files with unicode characters in their names:
Test Plan
Tested with local builds on Windows with and without
--enable-pytorch-flash-attention-windows.Test Result
Builds succeeded, ComfyUI generated images on my gfx1100 GPU (needed
TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1for aotriton on that GPU).Submission Checklist