[ROCm Windows] fix build failed by Apophis3158 · Pull Request #2519 · Dao-AILab/flash-attention

Apophis3158 · 2026-04-29T10:44:17Z

I encountered some issues while trying to build fa2 on ROCm Windows:

CK backend:

In the link phase, the cmd length will exceed the maximum length 32,767: pypa/distutils#406

Fix: write each obj line by line to the rsp file.

Building uses the hipcc.exe in the ROCm SDK and its llvm clang.exe, but no linker is specified, and it defaults to MSVC link.exe, which caused some issues (even with MSVC environment activated).

Fix: change to llvm's lld-link.exe.

After these two changes:

[...] DEBUG [2669/2669] H:\ROCm\.venv\Lib\site-packages\_rocm_sdk_devel\bin\hipcc.exe -std=c++20 -Xcompiler -D__HIP_PLATFORM_AMD__=1 -Xcompiler -DUSE_ROCM=1 -Xcompiler -DHIPBLAS_V2 -Xcompiler -fms-runtime-lib=dll -IH:\ROCm\flash-attention\csrc\composable_kernel\include -IH:\ROCm\flash-attention\csrc\composable_kernel\library\include -IH:\ROCm\flash-attention\csrc\composable_kernel\example\ck_tile\01_fmha -IH:\ROCm\.venv\Lib\site-packages\torch\include -IH:\ROCm\.venv\Lib\site-packages\torch\include\torch\csrc\api\include -IH:\ROCm\.venv\Lib\site-packages\torch\include\THH -IH:\ROCm\.venv\Lib\site-packages\_rocm_sdk_devel\include -IH:\ROCm\.venv\include -IC:\Users\Administrator\AppData\Roaming\uv\python\cpython-3.13-windows-x86_64-none\include -IC:\Users\Administrator\AppData\Roaming\uv\python\cpython-3.13-windows-x86_64-none\Include -c H:\ROCm\flash-attention\csrc\flash_attn_ck\mha_varlen_fwd.hip -o H:\ROCm\flash-attention\build\temp.win-amd64-cpython-313\Release\csrc\flash_attn_ck\mha_varlen_fwd.obj -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -DHIP_ENABLE_WARP_SYNC_BUILTINS=1 -fms-extensions -Wno-ignored-attributes -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -fms-runtime-lib=dll --offload-arch=gfx1201 -O3 -std=c++20 -Wno-unknown-warning-option -Wno-unused-command-line-argument -Wno-shift-count-overflow -Wno-deprecated-declarations -Wno-pass-failed -Wno-inconsistent-dllimport -Wno-cuda-compat -fbracket-depth=1024 -DCK_TILE_FMHA_FWD_FAST_EXP2=1 -fgpu-flush-denormals-to-zero -DCK_ENABLE_BF16 -DCK_ENABLE_BF8 -DCK_ENABLE_FP16 -DCK_ENABLE_FP32 -DCK_ENABLE_FP64 -DCK_ENABLE_FP8 -DCK_ENABLE_INT8 -DCK_USE_XDL -DUSE_PROF_API=1 -D__HIP_PLATFORM_HCC__=1 -D_CRT_SECURE_NO_WARNINGS -DCK_TILE_FLOAT_TO_BFLOAT16_DEFAULT=3 -mllvm --lsr-drop-solution=1 -fno-offload-uniform-block -mllvm -enable-post-misched=0 -mllvm -amdgpu-early-inline-all=true -mllvm -amdgpu-function-calls=false -DHIPIFY_V2 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -fno-gpu-rdc DEBUG Link RSP: build\lib.win-amd64-cpython-313\flash_attn_2_cuda.cp313-win_amd64.pyd.rsp (2669 objects, cmd reduced from 592201 to 929 chars) DEBUG H:\ROCm\.venv\Lib\site-packages\_rocm_sdk_devel\lib\llvm\bin\lld-link.exe @build\lib.win-amd64-cpython-313\flash_attn_2_cuda.cp313-win_amd64.pyd.rsp /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:H:\ROCm\.venv\Lib\site-packages\torch\lib /LIBPATH:H:\ROCm\.venv\Lib\site-packages\_rocm_sdk_devel\lib /LIBPATH:H:\ROCm\.venv\Lib\site-packages\_rocm_sdk_devel\hip\lib /LIBPATH:H:\ROCm\.venv\libs /LIBPATH:C:\Users\Administrator\AppData\Roaming\uv\python\cpython-3.13-windows-x86_64-none\libs /LIBPATH:C:\Users\Administrator\AppData\Roaming\uv\python\cpython-3.13-windows-x86_64-none /LIBPATH:H:\ROCm\.venv\PCbuild\amd64 c10.lib torch.lib torch_cpu.lib torch_python.lib amdhip64.lib c10_hip.lib torch_hip.lib /OUT:build\lib.win-amd64-cpython-313\flash_attn_2_cuda.cp313-win_amd64.pyd /IMPLIB:H:\ROCm\flash-attention\build\temp.win-amd64-cpython-313\Release\build\flash_attn_2_cuda.cp313-win_amd64.lib DEBUG installing to build\bdist.win-amd64\wheel DEBUG running install [...]

Triton backend:

The default requirement is `triton==3.5.1`, but it is not provided by aiter yet on Windows.

Fix: use triton-windows instead.

Apophis3158 · 2026-04-29T11:06:52Z

After getting rid of MSVC link.exe, building only requires a few cmds:

.\.venv\Scripts\activate.ps1

$ROCM_ROOT = rocm-sdk path --root
$env:ROCM_HOME = $ROCM_ROOT
$env:PATH = "$ROCM_ROOT\lib\llvm\bin;$ROCM_ROOT\bin;$env:PATH"

$env:DISTUTILS_USE_SDK = "1"

uv pip install --no-build-isolation -v .

And I don't know if it's a defect of the ROCm SDK that the link tool using MSVC's rather than LLVM's.

jammm · 2026-04-29T16:23:45Z

Can you try building again ? #2517 was merged which should fix your issue.

jammm · 2026-04-29T16:25:26Z

And I don't know if it's a defect of the ROCm SDK that the link tool using MSVC's rather than LLVM's.

it's more to do with setuptools, which is why pypa/distutils#406 is trying to fix it. Perhaps the lack of link.exe in your PATH led setuptools too find an alternative linker. But it still needs MSVC headers to do builds with, so it's not really a long-term fix I feel.

crashingalexsan · 2026-04-30T19:33:29Z

Can confirm building works for me and CK backend works (tested on GFX1201)

jammm · 2026-04-30T19:50:21Z

This PR shouldn’t be needed anymore because #2517 fixes the same issue that this PR is addressing.

…

On Friday, May 1, 2026, Alejandro Alberto Aguilar Morales < ***@***.***> wrote: *crashingalexsan* left a comment (Dao-AILab/flash-attention#2519) <#2519 (comment)> Can confirm building works for me and CK backend works (tested on GFX1201) image.png (view on web) <https://github.com/user-attachments/assets/d61e2794-9214-4415-a46b-f816769e077c> — Reply to this email directly, view it on GitHub <#2519 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AATCSOAGVD5AKN7K7UPHQYL4YOTB5AVCNFSM6AAAAACYKSY5HOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DGNJVGU4TENJVGQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you commented.Message ID: ***@***.***>

crashingalexsan · 2026-04-30T20:16:29Z

Yeah... sorry. tested main with #2517 patch. We should be able to close this PR.

astrelsky · 2026-05-02T14:49:28Z

And I don't know if it's a defect of the ROCm SDK that the link tool using MSVC's rather than LLVM's.

it's more to do with setuptools, which is why pypa/distutils#406 is trying to fix it. Perhaps the lack of link.exe in your PATH led setuptools too find an alternative linker. But it still needs MSVC headers to do builds with, so it's not really a long-term fix I feel.

To be honest it shouldn't fix anything at all because the problem is a windows limitation. I suspect the use of lld or some other linker caused something different to happen elsewhere that worked around the problem.

Apophis3158 · 2026-05-06T09:00:05Z

I'm quite sure AMD's intention was to build CK with LLVM, but since MSVC's linker works fine, it doesn't matter. The linking issue I encountered may also be related to uv, but rather than struggling with a linker, it is better to see how to make ccache/sccache run.

With $env:PYTORCH_NVCC = "sccache.exe $ROCM_ROOT\bin\hipcc.exe" set I got this:

  clang: error: cannot find libdevice for sm_52; provide path to different CUDA installation via '--cuda-path', or pass '-nocudalib' to build without linking with libdevice
  clang: error: cannot find CUDA installation; provide its path via '--cuda-path', or pass '-nocudainc' to build without CUDA includes

Apophis3158 · 2026-05-06T09:01:36Z

@jammm This PR now only contains issue with Triton backend, can you take a look?

jammm · 2026-05-06T09:13:58Z

lgtm!

jammm · 2026-05-06T09:14:15Z

@tridao can we merge this too please? This is for windows support with triton-windows.

tridao · 2026-05-06T12:05:08Z

is there a version of window-triton we should pin?

Apophis3158 · 2026-05-06T12:53:00Z

Not needed for now, aiter uses the latest version to run smoke test.

jammm · 2026-05-06T13:06:27Z

I would pin it to >=3.6.0. I'm not sure if 3.5.x has proper AMD support.

tridao · 2026-05-06T13:57:58Z

Let's pin it to >=3.6.0 then, then we'll merge

Apophis3158 · 2026-05-06T18:03:38Z

Pinned triton-windows>=3.6.0

Apophis3158 force-pushed the main branch from a841bf4 to 7697a3e Compare April 29, 2026 11:41

[ROCm Windows] fix triton requirement

7d90885

Apophis3158 force-pushed the main branch from 7697a3e to 7d90885 Compare May 6, 2026 08:31

tridao approved these changes May 6, 2026

View reviewed changes

pin triton-windows>=3.6.0

17cb4b5

tridao merged commit 495ef79 into Dao-AILab:main May 6, 2026

Conversation

Apophis3158 commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CK backend:

In the link phase, the cmd length will exceed the maximum length 32,767: pypa/distutils#406

Building uses the hipcc.exe in the ROCm SDK and its llvm clang.exe, but no linker is specified, and it defaults to MSVC link.exe, which caused some issues (even with MSVC environment activated).

Triton backend:

The default requirement is triton==3.5.1, but it is not provided by aiter yet on Windows.

Uh oh!

Apophis3158 commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jammm commented Apr 29, 2026

Uh oh!

jammm commented Apr 29, 2026

Uh oh!

crashingalexsan commented Apr 30, 2026

Uh oh!

jammm commented Apr 30, 2026 via email

Uh oh!

crashingalexsan commented Apr 30, 2026

Uh oh!

astrelsky commented May 2, 2026

Uh oh!

Apophis3158 commented May 6, 2026

Uh oh!

Apophis3158 commented May 6, 2026

Uh oh!

jammm commented May 6, 2026

Uh oh!

jammm commented May 6, 2026

Uh oh!

tridao commented May 6, 2026

Uh oh!

Apophis3158 commented May 6, 2026

Uh oh!

jammm commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tridao commented May 6, 2026

Uh oh!

Apophis3158 commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Apophis3158 commented Apr 29, 2026 •

edited

Loading

Building uses the `hipcc.exe` in the ROCm SDK and its llvm `clang.exe`, but no linker is specified, and it defaults to MSVC `link.exe`, which caused some issues (even with MSVC environment activated).

The default requirement is `triton==3.5.1`, but it is not provided by aiter yet on Windows.

Apophis3158 commented Apr 29, 2026 •

edited

Loading

jammm commented May 6, 2026 •

edited

Loading