Initial Windows ROCm build support for FlashAttention-2 ROCm/aiter Triton backend by 0xDELUXA · Pull Request #2384 · Dao-AILab/flash-attention

0xDELUXA · 2026-03-23T13:20:41Z

Motivation

Enable building and running FlashAttention-2 on Windows with AMD GPUs via the ROCm/aiter Triton backend after the migration. Three small issues blocked this entirely: a crash on import when torch.distributed attributes are missing, a broken aiter submodule setup step, and a hard dependency on the Linux-only triton package.

Note

This PR depends on Windows build support being merged in ROCm/aiter first (or applied locally). See the corresponding PR: ROCm/aiter#2428

Technical Details

setup.py

Skip git submodule update for third_party/aiter if the directory already exists, to avoid overwriting locally cloned versions
Pass ENABLE_CK=0 and PREBUILD_KERNELS=0 when installing aiter on Windows, since Composable Kernel and its pre-built HIP C++ kernels are not available there - the pure-Triton FA path is used instead
Replace the hard triton==3.5.1 dependency with triton-windows>=3.2.0 on Windows (triton-windows is the community port of Triton for Windows ROCm)
Add Operating System :: Microsoft :: Windows classifier

flash_attn/utils/distributed.py

Guard the torch.distributed backward-compatibility assignments with hasattr checks before assigning, preventing an AttributeError crash on Windows ROCm builds where _all_gather_base / _reduce_scatter_base may not exist

Test Plan

Source build and install (using both PRs for now):

git clone -b fa2-aiter-triton-win-support https://github.com/0xDELUXA/flash-attention.git
cd flash-attention\third_party
git clone -b fa2-triton-win-support https://github.com/0xDELUXA/aiter.git
cd ..
$env:ENABLE_CK = "0"
$env:PREBUILD_KERNELS = "0"
$env:FLASH_ATTENTION_TRITON_AMD_ENABLE = "TRUE"
pip install --no-build-isolation -e .

Run basic tests via from flash_attn import flash_attn_func.

Test Result

Successfully built FlashAttention-2 with aiter on Windows with an AMD GPU (gfx1200), ROCm 7.13.0a20260321, PyTorch 2.12.0a0+rocm7.13.0a20260321, Python 3.12.
All tests passed.

micmelesse · 2026-03-23T15:57:06Z

@0xDELUXA Thank you for your pr, happy to help review if needed.

0xDELUXA · 2026-03-23T16:27:09Z

@0xDELUXA Thank you for your pr, happy to help review if needed.

Appreciate it, but my PR over at ROCm/aiter changes a lot of things. Tried to make it as cross-platform as possible, but still, I’m not sure if it can ever be merged. Having Windows support again, after the migration, would be great.

micmelesse · 2026-03-23T16:45:19Z

@0xDELUXA If you are ok with it, I can cherry pick your commits and create a new pr on aiter and flash attention that I can work on. Your contribution will be preserved. It will help get things merged ASAP.

0xDELUXA · 2026-03-23T16:49:10Z

@0xDELUXA If you are ok with it, I can cherry pick your commits and create a new pr on aiter and flash attention that I can work on. Your contribution will be preserved. It will help get things merged ASAP.

Sure, go ahead. That PR, as it is now, enables FA-2 to be built on Windows with aiter Triton.
After your changes, I can help with local testing on Windows if needed. Thanks!

0xDELUXA · 2026-03-24T07:46:02Z

Closing this as it's superseded by #2385 by @micmelesse.

0xDELUXA mentioned this pull request Mar 23, 2026

Initial Windows ROCm build support for FlashAttention-2 ROCm/aiter Triton backend ROCm/aiter#2428

Closed

0xDELUXA force-pushed the fa2-aiter-triton-win-support branch from f7dc8ea to e21c1ae Compare March 23, 2026 13:49

0xDELUXA changed the title ~~Initial FA-2 aiter Triton Windows build support~~ Initial FA-2 ROCm/aiter Triton Windows build support Mar 23, 2026

0xDELUXA changed the title ~~Initial FA-2 ROCm/aiter Triton Windows build support~~ Initial Windows ROCm build support for FlashAttention-2 ROCm/aiter Triton backend Mar 23, 2026

micmelesse mentioned this pull request Mar 23, 2026

Revert #2230 #2383

Closed

Initial FA-2 aiter Triton Windows build support

5ac71b7

0xDELUXA force-pushed the fa2-aiter-triton-win-support branch from e21c1ae to 5ac71b7 Compare March 23, 2026 15:52

micmelesse mentioned this pull request Mar 23, 2026

[ROCM] Fix windows issues #2385

Merged

0xDELUXA closed this Mar 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial Windows ROCm build support for FlashAttention-2 ROCm/aiter Triton backend#2384

Initial Windows ROCm build support for FlashAttention-2 ROCm/aiter Triton backend#2384
0xDELUXA wants to merge 1 commit intoDao-AILab:mainfrom
0xDELUXA:fa2-aiter-triton-win-support

0xDELUXA commented Mar 23, 2026 •

edited

Loading

Uh oh!

micmelesse commented Mar 23, 2026

Uh oh!

0xDELUXA commented Mar 23, 2026 •

edited

Loading

Uh oh!

micmelesse commented Mar 23, 2026 •

edited

Loading

Uh oh!

0xDELUXA commented Mar 23, 2026 •

edited

Loading

Uh oh!

0xDELUXA commented Mar 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

0xDELUXA commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Uh oh!

micmelesse commented Mar 23, 2026

Uh oh!

0xDELUXA commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

micmelesse commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

0xDELUXA commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

0xDELUXA commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

0xDELUXA commented Mar 23, 2026 •

edited

Loading

0xDELUXA commented Mar 23, 2026 •

edited

Loading

micmelesse commented Mar 23, 2026 •

edited

Loading

0xDELUXA commented Mar 23, 2026 •

edited

Loading

0xDELUXA commented Mar 24, 2026 •

edited

Loading