Skip to content

Enabling Blackwell support#1254

Merged
danthe3rd merged 1 commit intofacebookresearch:mainfrom
loscrossos:blackwell_support
Jun 23, 2025
Merged

Enabling Blackwell support#1254
danthe3rd merged 1 commit intofacebookresearch:mainfrom
loscrossos:blackwell_support

Conversation

@loscrossos
Copy link
Copy Markdown
Contributor

Nvidia Blackwell cards have been out for a while bu xformers do not formally support them. This PR adds support for them by checking for CUDA12.8 and enabling capability 120. Even though higher capabilities are available in https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#gpu-feature-list, using 120 is a conservative approach that enables all 50 series cards. This PR definitely solves #1251 and possibly #1228

Signed-off-by: LosCrossos <165311345+loscrossos@users.noreply.github.com>
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 18, 2025
@ir-personal
Copy link
Copy Markdown

Any chance this could be prioritised? Having the same issue this PR solves and would appreciate this fix to be merged.

@loscrossos
Copy link
Copy Markdown
Contributor Author

i have a compiled library for linux python3.12 if you want to try out:

https://github.com/loscrossos/xformers/releases

@ir-personal
Copy link
Copy Markdown

thank you so much, I actually compilied it too in the morning (crossed my mind right after putting a comment in here).
I tried:
TORCH_CUDA_ARCH_LIST="12.0" (RTX 5090)
pip install ninja &&
pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers --no-build-isolation

Now fine-tuning stable diffusion sdxl (dreambooth + lora). To be honest, it's not running much faster than yesterday without xformers - so I am wondering there is still some gap somewhere :D

@loscrossos
Copy link
Copy Markdown
Contributor Author

are on windows or linux?

for the project i tried i also did not notice much improvement on windows. on Linux there was a performance increase. maybe there is more things to change down the line.. nevertheless this is a first step and it enables some projects to run at all.

@ir-personal
Copy link
Copy Markdown

it's Win 11 by I am running WSL2 Ubuntu, the training is also running inside container.

@Jeffkang-94
Copy link
Copy Markdown

Jeffkang-94 commented May 27, 2025

Hi, I have a qq.

It looks like this PR addresses some potential incompatibility on the Blackwell GPU arch. Did you find any severe issues except for RTX5090?

Our repo has some dependencies in this repository. i'm planning to use B200 machines. I'm wondering if any issues will happen, esp when using memory_efficient_attention.

@loscrossos
Copy link
Copy Markdown
Contributor Author

the issues are linked in the first comment. so blackwell does not run at all for those

@None9527
Copy link
Copy Markdown

i have a compiled library for linux python3.12 if you want to try out:

https://github.com/loscrossos/xformers/releases

thx for your whl Now I'm no longer getting the error about the missing sm_120, but I'm being prompted that this version of xformers requires torch 2.7. However, I'm currently using the nightly 2.8.0.dev version. Should I ignore this prompt or downgrade torch to 2.7.0?

WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
PyTorch 2.7.0+cu128 with CUDA 1209 (you have 2.8.0.dev20250609+cu128)
Python 3.12.3 (you have 3.12.11)

@loscrossos
Copy link
Copy Markdown
Contributor Author

you have CUDA 12.8 and i compiled with cuda 12.9. that is not a problem.

The pytorch version is: you will have to downgrade to 2.7.0 to use my libraries.
I might post a compile guide the weekend and then you can keep using python 2.8 if you want :)

@None9527
Copy link
Copy Markdown

you have CUDA 12.8 and i compiled with cuda 12.9. that is not a problem.

The pytorch version is: you will have to downgrade to 2.7.0 to use my libraries. I might post a compile guide the weekend and then you can keep using python 2.8 if you want :)

ok i get I'm really looking forward to your guide

@None9527
Copy link
Copy Markdown

you have CUDA 12.8 and i compiled with cuda 12.9. that is not a problem.

The pytorch version is: you will have to downgrade to 2.7.0 to use my libraries. I might post a compile guide the weekend and then you can keep using python 2.8 if you want :)

interesting I'm trying out the DeepCompressor project. Your whl file works with torch 2.8, but it just keeps giving me warnings. However, with torch 2.7, it simply won't run at all. It says it's missing GLIBCXX_3.4.31, but my libstdc++.so.6 is actually correct.

@loscrossos
Copy link
Copy Markdown
Contributor Author

loscrossos commented Jun 10, 2025

you can test for it with: strings /usr/lib/x86_64-linux-gnu/libstdc++.so.6 | grep GLIBCXX

@None9527
Copy link
Copy Markdown

当你这样做时会出现什么: ?strings /usr/lib/x86_64-linux-gnu/libstdc++.so.6 | grep GLIBCXX

run deepcomprosser anything about cuda

@VictorJouault
Copy link
Copy Markdown

+1 Awaiting on this

@danthe3rd
Copy link
Copy Markdown
Contributor

Merging this PR, although please be aware that we no longer build Flash-Attention 2 on Windows, we use Flash-Attention 3 now.

@danthe3rd danthe3rd merged commit d9b3b6e into facebookresearch:main Jun 23, 2025
1 check passed
@Panchovix
Copy link
Copy Markdown

Panchovix commented Jun 30, 2025

@danthe3rd sorry to hijack here, but is there a way to use FA2 again on Windows? I have a multiGPU setup with Ampere, Ada and Blackwell 2.0 (8.9, 8.9 and 12.0) but out of the box it seems it doesn't build FA2 now as before. When trying to force it I get other issues, on torch nightly.

With export XFORMERS_DISABLE_FLASH_ATTN=0

I get

[1/73] cl /showIncludes /nologo /O2 /W3 /GL /DNDEBUG /MD -IH:\f\xformers_cu128_nightly_py312_29-06-25\third_party\flash-attention\csrc\flash_attn -IH:\f\xformers_cu128_nightly_py312_29-06-25\third_party\flash-attention\csrc\flash_attn\src -IH:\f\xformers_cu128_nightly_py312_29-06-25\third_party\flash-attention\csrc\cutlass\include -IH:\f\xformers_cu128_nightly_py312_29-06-25\venv\Lib\site-packages\torch\include -IH:\f\xformers_cu128_nightly_py312_29-06-25\venv\Lib\site-packages\torch\include\torch\csrc\api\include "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.9\include" -IH:\f\xformers_cu128_nightly_py312_29-06-25\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python312\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python312\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.38.33130\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.26100.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.26100.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.26100.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.26100.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.26100.0\\cppwinrt" /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /wd4624 /wd4067 /wd4068 /EHsc -c H:\f\xformers_cu128_nightly_py312_29-06-25\third_party\flash-attention\csrc\flash_attn\flash_api.cpp /FoH:\f\xformers_cu128_nightly_py312_29-06-25\build\temp.win-amd64-cpython-312\Release\f\xformers_cu128_nightly_py312_29-06-25\third_party\flash-attention\csrc\flash_attn\flash_api.obj -O3 -std=c++17 -DPy_LIMITED_API=0x03090000 /MP /Zc:lambda /Zc:preprocessor /Zc:__cplusplus -DFLASHATTENTION_DISABLE_ALIBI -DFLASHATTENTION_DISABLE_SOFTCAP -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C_flashattention /std:c++17
FAILED: H:/f/xformers_cu128_nightly_py312_29-06-25/build/temp.win-amd64-cpython-312/Release/f/xformers_cu128_nightly_py312_29-06-25/third_party/flash-attention/csrc/flash_attn/flash_api.obj
cl /showIncludes /nologo /O2 /W3 /GL /DNDEBUG /MD -IH:\f\xformers_cu128_nightly_py312_29-06-25\third_party\flash-attention\csrc\flash_attn -IH:\f\xformers_cu128_nightly_py312_29-06-25\third_party\flash-attention\csrc\flash_attn\src -IH:\f\xformers_cu128_nightly_py312_29-06-25\third_party\flash-attention\csrc\cutlass\include -IH:\f\xformers_cu128_nightly_py312_29-06-25\venv\Lib\site-packages\torch\include -IH:\f\xformers_cu128_nightly_py312_29-06-25\venv\Lib\site-packages\torch\include\torch\csrc\api\include "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.9\include" -IH:\f\xformers_cu128_nightly_py312_29-06-25\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python312\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python312\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.38.33130\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.26100.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.26100.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.26100.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.26100.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.26100.0\\cppwinrt" /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /wd4624 /wd4067 /wd4068 /EHsc -c H:\f\xformers_cu128_nightly_py312_29-06-25\third_party\flash-attention\csrc\flash_attn\flash_api.cpp /FoH:\f\xformers_cu128_nightly_py312_29-06-25\build\temp.win-amd64-cpython-312\Release\f\xformers_cu128_nightly_py312_29-06-25\third_party\flash-attention\csrc\flash_attn\flash_api.obj -O3 -std=c++17 -DPy_LIMITED_API=0x03090000 /MP /Zc:lambda /Zc:preprocessor /Zc:__cplusplus -DFLASHATTENTION_DISABLE_ALIBI -DFLASHATTENTION_DISABLE_SOFTCAP -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C_flashattention /std:c++17
cl : Command line warning D9002 : ignoring unknown option '-O3'
cl : Command line warning D9002 : ignoring unknown option '-std=c++17'
H:\f\xformers_cu128_nightly_py312_29-06-25\venv\Lib\site-packages\torch\include\pybind11\buffer_info.h(107): error C2061: syntax error: identifier 'Py_buffer'

I have attached the file with the error
errorcompile_xformers_torch29.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants