Skip to content

Initial Windows ROCm build support for FlashAttention-2 ROCm/aiter Triton backend#2384

Closed
0xDELUXA wants to merge 1 commit intoDao-AILab:mainfrom
0xDELUXA:fa2-aiter-triton-win-support
Closed

Initial Windows ROCm build support for FlashAttention-2 ROCm/aiter Triton backend#2384
0xDELUXA wants to merge 1 commit intoDao-AILab:mainfrom
0xDELUXA:fa2-aiter-triton-win-support

Conversation

@0xDELUXA
Copy link
Copy Markdown
Contributor

@0xDELUXA 0xDELUXA commented Mar 23, 2026

Motivation

Enable building and running FlashAttention-2 on Windows with AMD GPUs via the ROCm/aiter Triton backend after the migration. Three small issues blocked this entirely: a crash on import when torch.distributed attributes are missing, a broken aiter submodule setup step, and a hard dependency on the Linux-only triton package.

Note

This PR depends on Windows build support being merged in ROCm/aiter first (or applied locally). See the corresponding PR: ROCm/aiter#2428

Technical Details

setup.py

  • Skip git submodule update for third_party/aiter if the directory already exists, to avoid overwriting locally cloned versions
  • Pass ENABLE_CK=0 and PREBUILD_KERNELS=0 when installing aiter on Windows, since Composable Kernel and its pre-built HIP C++ kernels are not available there - the pure-Triton FA path is used instead
  • Replace the hard triton==3.5.1 dependency with triton-windows>=3.2.0 on Windows (triton-windows is the community port of Triton for Windows ROCm)
  • Add Operating System :: Microsoft :: Windows classifier

flash_attn/utils/distributed.py

  • Guard the torch.distributed backward-compatibility assignments with hasattr checks before assigning, preventing an AttributeError crash on Windows ROCm builds where _all_gather_base / _reduce_scatter_base may not exist

Test Plan

  • Source build and install (using both PRs for now):
git clone -b fa2-aiter-triton-win-support https://github.com/0xDELUXA/flash-attention.git
cd flash-attention\third_party
git clone -b fa2-triton-win-support https://github.com/0xDELUXA/aiter.git
cd ..
$env:ENABLE_CK = "0"
$env:PREBUILD_KERNELS = "0"
$env:FLASH_ATTENTION_TRITON_AMD_ENABLE = "TRUE"
pip install --no-build-isolation -e .
  • Run basic tests via from flash_attn import flash_attn_func.

Test Result

  • Successfully built FlashAttention-2 with aiter on Windows with an AMD GPU (gfx1200), ROCm 7.13.0a20260321, PyTorch 2.12.0a0+rocm7.13.0a20260321, Python 3.12.

  • All tests passed.

@0xDELUXA 0xDELUXA force-pushed the fa2-aiter-triton-win-support branch from f7dc8ea to e21c1ae Compare March 23, 2026 13:49
@0xDELUXA 0xDELUXA changed the title Initial FA-2 aiter Triton Windows build support Initial FA-2 ROCm/aiter Triton Windows build support Mar 23, 2026
@0xDELUXA 0xDELUXA changed the title Initial FA-2 ROCm/aiter Triton Windows build support Initial Windows ROCm build support for FlashAttention-2 ROCm/aiter Triton backend Mar 23, 2026
@micmelesse micmelesse mentioned this pull request Mar 23, 2026
@0xDELUXA 0xDELUXA force-pushed the fa2-aiter-triton-win-support branch from e21c1ae to 5ac71b7 Compare March 23, 2026 15:52
@micmelesse
Copy link
Copy Markdown
Collaborator

@0xDELUXA Thank you for your pr, happy to help review if needed.

@0xDELUXA
Copy link
Copy Markdown
Contributor Author

0xDELUXA commented Mar 23, 2026

@0xDELUXA Thank you for your pr, happy to help review if needed.

Appreciate it, but my PR over at ROCm/aiter changes a lot of things. Tried to make it as cross-platform as possible, but still, I’m not sure if it can ever be merged. Having Windows support again, after the migration, would be great.

@micmelesse
Copy link
Copy Markdown
Collaborator

micmelesse commented Mar 23, 2026

@0xDELUXA If you are ok with it, I can cherry pick your commits and create a new pr on aiter and flash attention that I can work on. Your contribution will be preserved. It will help get things merged ASAP.

@0xDELUXA
Copy link
Copy Markdown
Contributor Author

0xDELUXA commented Mar 23, 2026

@0xDELUXA If you are ok with it, I can cherry pick your commits and create a new pr on aiter and flash attention that I can work on. Your contribution will be preserved. It will help get things merged ASAP.

Sure, go ahead. That PR, as it is now, enables FA-2 to be built on Windows with aiter Triton.
After your changes, I can help with local testing on Windows if needed. Thanks!

@0xDELUXA
Copy link
Copy Markdown
Contributor Author

0xDELUXA commented Mar 24, 2026

Closing this as it's superseded by #2385 by @micmelesse.

@0xDELUXA 0xDELUXA closed this Mar 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants