Initial Windows ROCm build support for FlashAttention-2 ROCm/aiter Triton backend#2384
Closed
0xDELUXA wants to merge 1 commit intoDao-AILab:mainfrom
Closed
Initial Windows ROCm build support for FlashAttention-2 ROCm/aiter Triton backend#23840xDELUXA wants to merge 1 commit intoDao-AILab:mainfrom
0xDELUXA wants to merge 1 commit intoDao-AILab:mainfrom
Conversation
f7dc8ea to
e21c1ae
Compare
Closed
e21c1ae to
5ac71b7
Compare
Collaborator
|
@0xDELUXA Thank you for your pr, happy to help review if needed. |
Contributor
Author
Collaborator
|
@0xDELUXA If you are ok with it, I can cherry pick your commits and create a new pr on aiter and flash attention that I can work on. Your contribution will be preserved. It will help get things merged ASAP. |
Contributor
Author
Sure, go ahead. That PR, as it is now, enables FA-2 to be built on Windows with aiter Triton. |
Contributor
Author
|
Closing this as it's superseded by #2385 by @micmelesse. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Enable building and running FlashAttention-2 on Windows with AMD GPUs via the
ROCm/aiterTriton backend after the migration. Three small issues blocked this entirely: a crash on import whentorch.distributedattributes are missing, a broken aiter submodule setup step, and a hard dependency on the Linux-onlytritonpackage.Note
This PR depends on Windows build support being merged in
ROCm/aiterfirst (or applied locally). See the corresponding PR: ROCm/aiter#2428Technical Details
setup.pygit submodule updateforthird_party/aiterif the directory already exists, to avoid overwriting locally cloned versionsENABLE_CK=0andPREBUILD_KERNELS=0when installing aiter on Windows, since Composable Kernel and its pre-built HIP C++ kernels are not available there - the pure-Triton FA path is used insteadtriton==3.5.1dependency withtriton-windows>=3.2.0on Windows (triton-windowsis the community port of Triton for Windows ROCm)Operating System :: Microsoft :: Windowsclassifierflash_attn/utils/distributed.pytorch.distributedbackward-compatibility assignments withhasattrchecks before assigning, preventing anAttributeErrorcrash on Windows ROCm builds where_all_gather_base/_reduce_scatter_basemay not existTest Plan
from flash_attn import flash_attn_func.Test Result
Successfully built FlashAttention-2 with aiter on Windows with an AMD GPU (
gfx1200), ROCm7.13.0a20260321, PyTorch2.12.0a0+rocm7.13.0a20260321, Python3.12.All tests passed.