Conversation
|
I will take this out of draft when It is ready to merge. |
123f341 to
44f9b01
Compare
|
@astrelsky @0xDELUXA Can you test this pr? I added CI smoke test for windows and it is passing. See https://github.com/ROCm/aiter/actions/runs/23464138688/job/68272544832 |
PS C:> git clone https://github.com/Dao-AILab/flash-attention
Cloning into 'flash-attention'...
remote: Enumerating objects: 13886, done.
remote: Counting objects: 100% (209/209), done.
remote: Compressing objects: 100% (65/65), done.
remote: Total 13886 (delta 175), reused 144 (delta 144), pack-reused 13677 (from 3)
Receiving objects: 100% (13886/13886), 19.72 MiB | 28.52 MiB/s, done.
Resolving deltas: 100% (10678/10678), done.
Updating files: 100% (1016/1016), done.
PS C:> cd flash-attention
PS C:\flash-attention> git fetch origin pull/2385/head:pr-2385
remote: Enumerating objects: 12, done.
remote: Counting objects: 100% (11/11), done.
remote: Compressing objects: 100% (5/5), done.
remote: Total 12 (delta 5), reused 10 (delta 5), pack-reused 1 (from 1)
Unpacking objects: 100% (12/12), 10.05 KiB | 250.00 KiB/s, done.
From https://github.com/Dao-AILab/flash-attention
* [new ref] refs/pull/2385/head -> pr-2385
PS C:\flash-attention> git checkout pr-2385
Switched to branch 'pr-2385'
PS C:\flash-attention> C:\ComfyUI\venv\Scripts\Activate.ps1
(venv) PS C:\flash-attention> $env:FLASH_ATTENTION_TRITON_AMD_ENABLE = "TRUE"
(venv) PS C:\flash-attention> pip install --no-build-isolation -e .
Obtaining file:///C:/flash-attention
Checking if build backend supports build_editable ... done
Preparing editable metadata (pyproject.toml) ... done
Requirement already satisfied: einops in C:\ComfyUI\venv\Lib\site-packages (from flash_attn==2.8.4) (0.8.2)
Requirement already satisfied: triton-windows>=3.2.0 in C:\ComfyUI\venv\Lib\site-packages (from flash_attn==2.8.4) (3.6.0+gitae9d5a54.post27)
Building wheels for collected packages: flash_attn
Building editable for flash_attn (pyproject.toml) ... done
Created wheel for flash_attn: filename=flash_attn-2.8.4-0.editable-py3-none-any.whl size=12336 sha256=c49f1426ca6dcc43d57d28f77669d96e518a66800fd1212ba01229f0339a7542
Stored in directory: C:\Users\deluxa\AppData\Local\Temp\pip-ephem-wheel-cache-55s_n3mf\wheels\11\60\82\d3b022b8cb27485d3b1fe0f35654cc0629565dbe36cf5323b0
Successfully built flash_attn
Installing collected packages: flash_attn
Attempting uninstall: flash_attn
Found existing installation: flash_attn 2.8.4
Uninstalling flash_attn-2.8.4:
Successfully uninstalled flash_attn-2.8.4
Successfully installed flash_attn-2.8.4
(venv) PS C:\flash-attention> python -c "from flash_attn.flash_attn_interface import flash_attn_func"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\flash-attention\flash_attn\__init__.py", line 8, in <module>
from flash_attn.flash_attn_interface import (
File "C:\flash-attention\flash_attn\flash_attn_interface.py", line 21, in <module>
from aiter.ops.triton._triton_kernels.flash_attn_triton_amd import flash_attn_2 as flash_attn_gpu
File "C:\flash-attention\third_party\aiter\aiter\__init__.py", line 59, in <module>
from .jit import core as core # noqa: E402
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\flash-attention\third_party\aiter\aiter\jit\core.py", line 23, in <module>
from chip_info import get_gfx, get_gfx_list # noqa: E402
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\flash-attention\third_party\aiter\aiter\jit/utils\chip_info.py", line 8, in <module>
from cpp_extension import executable_path
File "C:\flash-attention\third_party\aiter\aiter\jit/utils\cpp_extension.py", line 175, in <module>
HIP_HOME = _join_rocm_home("hip") if ROCM_HOME else None
^^^^^^^^^^^^^^^^^^^^^^
File "C:\flash-attention\third_party\aiter\aiter\jit/utils\cpp_extension.py", line 134, in _join_rocm_home
raise OSError(
OSError: Building PyTorch extensions using ROCm and Windows is not supported.I think the CI only passes because it runs on a clean Windows runner with no ROCm installation, so (venv) PS C:\flash-attention> $env:ROCM_HOME = "$VENV_PATH\Lib\site-packages\_rocm_sdk_devel"
(venv) PS C:\flash-attention> python -c "from flash_attn.flash_attn_interface import flash_attn_func"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\flash-attention\flash_attn\__init__.py", line 8, in <module>
from flash_attn.flash_attn_interface import (
File "C:\flash-attention\flash_attn\flash_attn_interface.py", line 21, in <module>
from aiter.ops.triton._triton_kernels.flash_attn_triton_amd import flash_attn_2 as flash_attn_gpu
File "C:\flash-attention\third_party\aiter\aiter\__init__.py", line 59, in <module>
from .jit import core as core # noqa: E402
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\flash-attention\third_party\aiter\aiter\jit\core.py", line 23, in <module>
from chip_info import get_gfx, get_gfx_list # noqa: E402
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\flash-attention\third_party\aiter\aiter\jit/utils\chip_info.py", line 8, in <module>
from cpp_extension import executable_path
File "C:\flash-attention\third_party\aiter\aiter\jit/utils\cpp_extension.py", line 175, in <module>
HIP_HOME = _join_rocm_home("hip") if ROCM_HOME else None
^^^^^^^^^^^^^^^^^^^^^^
File "C:\flash-attention\third_party\aiter\aiter\jit/utils\cpp_extension.py", line 134, in _join_rocm_home
raise OSError(
OSError: Building PyTorch extensions using ROCm and Windows is not supported.I'm pretty sure we need to bring in a few more changes from my initial |
|
I will check this out on an internal node and get back to you. |
0099fbc to
77e478e
Compare
|
@astrelsky @0xDELUXA Can you try the new commit of this pr? I think the issue is fixed . I tried on an internal node and I see the build issues resolved. |
|
@astrelsky I removed the IS_WINDOWS error in cpp_extension.py. This is the command that I am used to get the message above. You might have to update it. |
Ok, that looks about equivalent to what I did last night except the last part. I'll give it another shot this morning I should have a moment in between an appointment this morning before I go to work. Hopefully @0xDELUXA can try as well. |
|
I gave it a shot before hopping in the shower this morning. Here are the exact steps I took for full reproducibility and the results. Then I did the following: Then as one more final test, I ran pytest output |
PS C:\Users\deluxa> C:\Comfyui\venv\Scripts\Activate.ps1
(venv) PS C:\Users\deluxa> $env:WORK = "C:\t"
(venv) PS C:\Users\deluxa> git clone --depth 1 -b micmelesse/windows-rocm-support https://github.com/ROCm/aiter.git "$env:WORK\aiter"
(venv) PS C:\Users\deluxa> pip install -e "$env:WORK\aiter" --no-build-isolation
(venv) PS C:\Users\deluxa> git clone --depth 1 -b micmelesse/windows-rocm-support https://github.com/ROCm/flash-attention.git "$env:WORK\fa"
(venv) PS C:\Users\deluxa> robocopy "$env:WORK\aiter" "$env:WORK\fa\third_party\aiter" /E /XD .git 3rdparty /NFL /NDL /NJH /NJS
(venv) PS C:\Users\deluxa> Remove-Item -Recurse -Force "$env:WORK\fa\.git"
(venv) PS C:\Users\deluxa> $env:FLASH_ATTENTION_TRITON_AMD_ENABLE = "TRUE"
(venv) PS C:\Users\deluxa> pip install --no-build-isolation --no-deps -e "$env:WORK\fa"
(venv) PS C:\Users\deluxa> python -c "from flash_attn.flash_attn_interface import flash_attn_func; print('OK:', flash_attn_func)"
'cat' is not recognized as an internal or external command,
operable program or batch file.
[aiter] import [module_aiter_enum] under C:\ComfyUI\venv\Lib\site-packages\aiter\jit\module_aiter_enum.pyd
[aiter] ROCm/HIP JIT runtime not available: Get GPU arch from rocminfo failed Could not find rocminfo in PATH or ROCM_HOME(C:\ComfyUI\venv). CK and HIP ops are disabled. Triton ops remain available.
OK: <function flash_attn_func at 0x0000019045CC4040>Yes, it works. Verified by running smoke tests successfully. / In my opinion, we could improve the error message: Currently, this message suggests that the absence of Also, this one: The output from my initial PR is cleaner (though it contains way more changes): Yes, this is purely cosmetic, but I thought I’d mention it. What do you think, @micmelesse? Alternatively, it may be better to minimize the diff and keep these warnings, but then they will appear on Windows every time flash attention is called. |
|
Bumped the aiter submodule. On Windows, aiter now skips CK/HIP imports entirely and shows Assuming you have torch, triton-windows, setuptools already installed: |
(venv) PS C:\Users\deluxa> python -c "from flash_attn.flash_attn_interface import flash_attn_func; print('OK:', flash_attn_func)"
[aiter] Windows: CK and HIP ops are not available. Triton ops only.
OK: <function flash_attn_func at 0x000001A1CCC3D760>Thanks, this looks great! |
|
As a side note, for anyone who wants to experiment, |
|
So I removed the exception handling so it would log the error, and it looks like I understand that the intent here is to get use of triton working again, so it doesn't need to be fixed here, but it does need to be fixed in Tests from running |
|
@micmelesse seems to be working ok, all tests passed. |
|
Looks good, I think it could be marked as ready for review. |
|
I ran some benchmarks by checking out #2217 locally, building it, and comparing it against aiter FA (on Windows). Here are the results:
Key Observations
|
Thanks very much for the comparison! Could you please share your benchmark scripts? Thanks! |
Of course! Here they are: |
I will do further research based on it, will share you the updates once anything helpful! |
@tridao Ready for review. Just a submodule bump that fixes the Windows build issues (ROCm/aiter#2433). Also added a CI test to catch similar build issues going forward. |
|
I would like to point out that Windows users must use the To build and install, use: / These errors also continue to persist: It would be great to include this diff here as well. Referencing ROCm#172. |
5be2555 to
50e2ab6
Compare
|
@0xDELUXA Added the |
Thanks! I was quite determined to address this distributed support issue on Windows ROCm:
Yeah, in the meantime we can install |
According to the conversation, they're going to leave it as |
|
I prefer to keep this PR minimal and just fix the Windows issues. A conditional |
|
@tridao Would love to see this merged when you get a chance. Without this PR's changes, Windows ROCm users cannot build or use Flash Attention at all. Thanks! |
|
@micmelesse LGTM let's merge when it's ready. |
|
Thanks @tridao. I don't have merge permissions on this repo. Could you merge it or grant me access? It would be great for our work at AMD. |
|
@micmelesse i've just added you, you should be able to merge now |
|
@tridao Thank you |







This PR fixes issues experienced by users using the triton backend of flash attention on windows. See #2383. This pr depends on ROCm/aiter#2433 which has to be merged first. I will update here when that happens.
This PR is a continuation of #2384 by @0xDELUXA. It cherry picks the author's commits with their permission. I take responsibility for the remaining work.
I have added ci tests in aiter so we will detect issues like this going forward.