Enabling Blackwell support by loscrossos · Pull Request #1254 · facebookresearch/xformers

loscrossos · 2025-05-18T12:05:31Z

Nvidia Blackwell cards have been out for a while bu xformers do not formally support them. This PR adds support for them by checking for CUDA12.8 and enabling capability 120. Even though higher capabilities are available in https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#gpu-feature-list, using 120 is a conservative approach that enables all 50 series cards. This PR definitely solves #1251 and possibly #1228

Signed-off-by: LosCrossos <165311345+loscrossos@users.noreply.github.com>

ir-personal · 2025-05-25T08:48:42Z

Any chance this could be prioritised? Having the same issue this PR solves and would appreciate this fix to be merged.

loscrossos · 2025-05-25T09:42:29Z

i have a compiled library for linux python3.12 if you want to try out:

https://github.com/loscrossos/xformers/releases

ir-personal · 2025-05-25T16:56:55Z

thank you so much, I actually compilied it too in the morning (crossed my mind right after putting a comment in here).
I tried:
TORCH_CUDA_ARCH_LIST="12.0" (RTX 5090)
pip install ninja &&
pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers --no-build-isolation

Now fine-tuning stable diffusion sdxl (dreambooth + lora). To be honest, it's not running much faster than yesterday without xformers - so I am wondering there is still some gap somewhere :D

loscrossos · 2025-05-25T20:35:27Z

are on windows or linux?

for the project i tried i also did not notice much improvement on windows. on Linux there was a performance increase. maybe there is more things to change down the line.. nevertheless this is a first step and it enables some projects to run at all.

ir-personal · 2025-05-26T18:51:07Z

it's Win 11 by I am running WSL2 Ubuntu, the training is also running inside container.

Jeffkang-94 · 2025-05-27T06:14:08Z

Hi, I have a qq.

It looks like this PR addresses some potential incompatibility on the Blackwell GPU arch. Did you find any severe issues except for RTX5090?

Our repo has some dependencies in this repository. i'm planning to use B200 machines. I'm wondering if any issues will happen, esp when using memory_efficient_attention.

loscrossos · 2025-05-27T11:52:22Z

the issues are linked in the first comment. so blackwell does not run at all for those

None9527 · 2025-06-10T08:24:00Z

i have a compiled library for linux python3.12 if you want to try out:

https://github.com/loscrossos/xformers/releases

thx for your whl Now I'm no longer getting the error about the missing sm_120, but I'm being prompted that this version of xformers requires torch 2.7. However, I'm currently using the nightly 2.8.0.dev version. Should I ignore this prompt or downgrade torch to 2.7.0?

WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
PyTorch 2.7.0+cu128 with CUDA 1209 (you have 2.8.0.dev20250609+cu128)
Python 3.12.3 (you have 3.12.11)

loscrossos · 2025-06-10T10:22:11Z

you have CUDA 12.8 and i compiled with cuda 12.9. that is not a problem.

The pytorch version is: you will have to downgrade to 2.7.0 to use my libraries.
I might post a compile guide the weekend and then you can keep using python 2.8 if you want :)

None9527 · 2025-06-10T10:40:31Z

you have CUDA 12.8 and i compiled with cuda 12.9. that is not a problem.

The pytorch version is: you will have to downgrade to 2.7.0 to use my libraries. I might post a compile guide the weekend and then you can keep using python 2.8 if you want :)

ok i get I'm really looking forward to your guide

None9527 · 2025-06-10T11:49:05Z

you have CUDA 12.8 and i compiled with cuda 12.9. that is not a problem.

The pytorch version is: you will have to downgrade to 2.7.0 to use my libraries. I might post a compile guide the weekend and then you can keep using python 2.8 if you want :)

interesting I'm trying out the DeepCompressor project. Your whl file works with torch 2.8, but it just keeps giving me warnings. However, with torch 2.7, it simply won't run at all. It says it's missing GLIBCXX_3.4.31, but my libstdc++.so.6 is actually correct.

loscrossos · 2025-06-10T12:15:07Z

you can test for it with: strings /usr/lib/x86_64-linux-gnu/libstdc++.so.6 | grep GLIBCXX

None9527 · 2025-06-10T12:26:44Z

当你这样做时会出现什么：？strings /usr/lib/x86_64-linux-gnu/libstdc++.so.6 | grep GLIBCXX

run deepcomprosser anything about cuda

VictorJouault · 2025-06-10T20:22:50Z

+1 Awaiting on this

danthe3rd · 2025-06-23T08:07:20Z

Merging this PR, although please be aware that we no longer build Flash-Attention 2 on Windows, we use Flash-Attention 3 now.

Panchovix · 2025-06-30T17:34:24Z

@danthe3rd sorry to hijack here, but is there a way to use FA2 again on Windows? I have a multiGPU setup with Ampere, Ada and Blackwell 2.0 (8.9, 8.9 and 12.0) but out of the box it seems it doesn't build FA2 now as before. When trying to force it I get other issues, on torch nightly.

With export XFORMERS_DISABLE_FLASH_ATTN=0

I get

[1/73] cl /showIncludes /nologo /O2 /W3 /GL /DNDEBUG /MD -IH:\f\xformers_cu128_nightly_py312_29-06-25\third_party\flash-attention\csrc\flash_attn -IH:\f\xformers_cu128_nightly_py312_29-06-25\third_party\flash-attention\csrc\flash_attn\src -IH:\f\xformers_cu128_nightly_py312_29-06-25\third_party\flash-attention\csrc\cutlass\include -IH:\f\xformers_cu128_nightly_py312_29-06-25\venv\Lib\site-packages\torch\include -IH:\f\xformers_cu128_nightly_py312_29-06-25\venv\Lib\site-packages\torch\include\torch\csrc\api\include "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.9\include" -IH:\f\xformers_cu128_nightly_py312_29-06-25\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python312\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python312\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.38.33130\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.26100.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.26100.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.26100.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.26100.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.26100.0\\cppwinrt" /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /wd4624 /wd4067 /wd4068 /EHsc -c H:\f\xformers_cu128_nightly_py312_29-06-25\third_party\flash-attention\csrc\flash_attn\flash_api.cpp /FoH:\f\xformers_cu128_nightly_py312_29-06-25\build\temp.win-amd64-cpython-312\Release\f\xformers_cu128_nightly_py312_29-06-25\third_party\flash-attention\csrc\flash_attn\flash_api.obj -O3 -std=c++17 -DPy_LIMITED_API=0x03090000 /MP /Zc:lambda /Zc:preprocessor /Zc:__cplusplus -DFLASHATTENTION_DISABLE_ALIBI -DFLASHATTENTION_DISABLE_SOFTCAP -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C_flashattention /std:c++17
FAILED: H:/f/xformers_cu128_nightly_py312_29-06-25/build/temp.win-amd64-cpython-312/Release/f/xformers_cu128_nightly_py312_29-06-25/third_party/flash-attention/csrc/flash_attn/flash_api.obj
cl /showIncludes /nologo /O2 /W3 /GL /DNDEBUG /MD -IH:\f\xformers_cu128_nightly_py312_29-06-25\third_party\flash-attention\csrc\flash_attn -IH:\f\xformers_cu128_nightly_py312_29-06-25\third_party\flash-attention\csrc\flash_attn\src -IH:\f\xformers_cu128_nightly_py312_29-06-25\third_party\flash-attention\csrc\cutlass\include -IH:\f\xformers_cu128_nightly_py312_29-06-25\venv\Lib\site-packages\torch\include -IH:\f\xformers_cu128_nightly_py312_29-06-25\venv\Lib\site-packages\torch\include\torch\csrc\api\include "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.9\include" -IH:\f\xformers_cu128_nightly_py312_29-06-25\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python312\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python312\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.38.33130\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.26100.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.26100.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.26100.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.26100.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.26100.0\\cppwinrt" /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /wd4624 /wd4067 /wd4068 /EHsc -c H:\f\xformers_cu128_nightly_py312_29-06-25\third_party\flash-attention\csrc\flash_attn\flash_api.cpp /FoH:\f\xformers_cu128_nightly_py312_29-06-25\build\temp.win-amd64-cpython-312\Release\f\xformers_cu128_nightly_py312_29-06-25\third_party\flash-attention\csrc\flash_attn\flash_api.obj -O3 -std=c++17 -DPy_LIMITED_API=0x03090000 /MP /Zc:lambda /Zc:preprocessor /Zc:__cplusplus -DFLASHATTENTION_DISABLE_ALIBI -DFLASHATTENTION_DISABLE_SOFTCAP -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C_flashattention /std:c++17
cl : Command line warning D9002 : ignoring unknown option '-O3'
cl : Command line warning D9002 : ignoring unknown option '-std=c++17'
H:\f\xformers_cu128_nightly_py312_29-06-25\venv\Lib\site-packages\torch\include\pybind11\buffer_info.h(107): error C2061: syntax error: identifier 'Py_buffer'

I have attached the file with the error
errorcompile_xformers_torch29.txt

adding Enabling Blackwell support

456077e

Signed-off-by: LosCrossos <165311345+loscrossos@users.noreply.github.com>

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 18, 2025

This was referenced May 18, 2025

memory_efficient_attention fails with half precision tensors, with pre-compiled xformers 0.0.30 on RTX 5090 #1251

Open

Error when building with CUDA 12.8, Python 3.11 and Debian 12 #1228

Open

LagPixelLOL mentioned this pull request Jun 5, 2025

Added Blackwell Support #1262

Open

danthe3rd merged commit d9b3b6e into facebookresearch:main Jun 23, 2025
1 check passed

loscrossos mentioned this pull request Jun 28, 2025

The xformers wheel does not support the 50-series GPUs. #1279

Open

loscrossos mentioned this pull request Jul 4, 2025

adding blackwell support for precompiled wheels #1285

Open

Conversation

loscrossos commented May 18, 2025

Uh oh!

ir-personal commented May 25, 2025

Uh oh!

loscrossos commented May 25, 2025

Uh oh!

ir-personal commented May 25, 2025

Uh oh!

loscrossos commented May 25, 2025

Uh oh!

ir-personal commented May 26, 2025

Uh oh!

Jeffkang-94 commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

loscrossos commented May 27, 2025

Uh oh!

None9527 commented Jun 10, 2025

Uh oh!

loscrossos commented Jun 10, 2025

Uh oh!

None9527 commented Jun 10, 2025

Uh oh!

None9527 commented Jun 10, 2025

Uh oh!

loscrossos commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

None9527 commented Jun 10, 2025

Uh oh!

VictorJouault commented Jun 10, 2025

Uh oh!

danthe3rd commented Jun 23, 2025

Uh oh!

Uh oh!

Panchovix commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Jeffkang-94 commented May 27, 2025 •

edited

Loading

loscrossos commented Jun 10, 2025 •

edited

Loading

Panchovix commented Jun 30, 2025 •

edited

Loading