SDPA backend priority #9299

contentis · 2025-08-12T12:16:35Z

Enable cuDNN attention and set it as the highest priority backend. cuDNN SDPA backend performs on-par or sometimes faster than flash-attention backend. More importantly, is the flash-attention backend disabled on windows, falling back to the much slower mem-efficient backend.
On Windows I've seen ~2x speed-up for SDPA kernel on multiple models (FLUX, SDXL, SD3.5 Medium & Large, Qwen).

jurgenprins · 2025-08-13T19:11:39Z

comfyui does not start anymore; this is what I get now on startup

pytorch version: 2.5.1+cu124
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 3060 : cudaMallocAsync
Traceback (most recent call last):
File "E:\ComfyUI\main.py", line 147, in
import execution
File "E:\ComfyUI\execution.py", line 16, in
import nodes
File "E:\ComfyUI\nodes.py", line 24, in
import comfy.diffusers_load
File "E:\ComfyUI\comfy\diffusers_load.py", line 3, in
import comfy.sd
File "E:\ComfyUI\comfy\sd.py", line 9, in
from .ldm.models.autoencoder import AutoencoderKL, AutoencodingEngine
File "E:\ComfyUI\comfy\ldm\models\autoencoder.py", line 11, in
import comfy.ops
File "E:\ComfyUI\comfy\ops.py", line 82, in
class disable_weight_init:
File "E:\ComfyUI\comfy\ops.py", line 262, in disable_weight_init
@sdpa_kernel(backends=SDPA_BACKEND_PRIORITY, set_priority=True)
File "C:\Program Files\Python310\lib\contextlib.py", line 281, in helper
return _GeneratorContextManager(func, args, kwds)
File "C:\Program Files\Python310\lib\contextlib.py", line 103, in init
self.gen = func(*args, **kwds)
TypeError: sdpa_kernel() got an unexpected keyword argument 'set_priority'

contentis · 2025-08-14T08:42:06Z

@jurgenprins this API was introduced with PyTorch 2.6, can you please try upgrading Torch?

jurgenprins · 2025-08-14T08:58:20Z

@jurgenprins this API was introduced with PyTorch 2.6, can you please try upgrading Torch?

I noted it was due to Torch 2.5.1 thanks!

I am happy fow now to accept that its now in try/catch with a 'cannot set' message, instead of crashing at startup.

I am not sure exactly what would be the benefit of immediate upgrading to make this work perhaps something for the release notes.

In time will try the upgrade, thank you!

Askelhardd · 2025-08-17T19:47:14Z

This won't force replace SageAttention 2++/3, right ?

contentis · 2025-08-18T07:37:00Z

This won't force replace SageAttention 2++/3, right ?

It shouldn't - if you encounter any issues, please let me know and I'll look into it. Feel free to tag me in a related issue.

Arcitec · 2025-08-31T12:50:06Z

Thanks for doing this. Sounds like it replaces the default attention calculations for CUDA users. But if people already use SageAttention, there's no speed change, right?

If so, it's a great improvement for stock usage but not for people who already use a faster attention method.

From what I can see:

Windows: Used slow attention by default due to no FlashAttention on that platform. This commit gives them up to 2x speedup via cuDNN SDPA.
Linux: Replaces FlashAttention with cuDNN SDPA, which is "on-par or slightly faster". So not much difference?
SageAttention: People who manually use SageAttention already have even better performance. No performance gains for those users. For example, "SageAttention 2++" is 3.9x faster than FlashAttention.

Have any other CUDA specific changes been merged recently (after Comfy 3.50) by the way?

contentis · 2025-09-01T06:10:47Z

If people are using a 3rd party Flash Attention implementation there shouldn't be large improvements.
I've found SageAttention to be a mixed-experience in terms of performance as the Q/DQ takes very long if unfused ending up to deliver similar performance to FP16/BF16. Even when moving to SA3 gains were "surprisingly" small out of the box.

Ther is another pending PR for convolution performance (SDXL/SDx.x) that is CUDA specific: #9301

There'll be more to come in the future, but it'll take some time.

Arcitec · 2025-09-01T07:14:41Z

@contentis Thank you so much for the great explanation, I really appreciate it.

I agree that SageAttention has some painful parts that can still be optimized, and I've been curious about SA3's speedups via their new FP4 algorithm.

But hearing that SDPA is not that far behind SageAttention (v2/v3) in speed, is interesting. I haven't seen people compare those before, but now I found an old discussion from October 2024 where the person said that SageAttention v1 (the only one out at that time; before the public release) was 14% faster than SDPA. Newer SageAttention versions should be even faster than SDPA.

I'll try doing some comparisons on Linux when my 5090 arrives. If SDPA offers better quality at a small slowdown, it could be worth it!

Thanks a lot for the SDXL/SD1.5 pull request too. That model is still pretty good, so the speedups you've gained there are fantastic. <3

contentis · 2025-09-01T07:27:48Z

Here is an example using the FLUX SDPA config:

cuDNN Attnetion:

SageAtten 2++

As you can see, for the kernel itself we see a speed-up of ~592.513/361.617=1.63x. But given that the IO needs to be quantized/dequantized (Q/DQ) there are 5 additional kernels running. They are very fast but add up, making the end-to-end time for SA ~457us and thereby the speed-up comes down to ~592.513/457=1.3x.

Arcitec · 2025-09-01T07:41:08Z

Ahh yes, I see, the time of all the extra kernels adds up and it's not saving that much time overall.

That's also a real testament to how good SDPA is already, compared to FlashAttention.

For the small time difference, I would happily switch from SA to cuDNN SDPA if it's better quality. But that seems like a hard bar to beat, since it's constantly claimed that SA2 has practically no quality loss compared to FlashAttention.

So the way I see it, SDPA is an amazing out-of-the-box choice that really helps people, and they won't have to spend time compiling SageAttention 2/3 (Triton and C compiler). But if someone wants a little more speed (useful for video models), going through the extra work of installing SageAttention 2++ or 3 is worth it. It just won't be as much of a boost anymore, since cuDNN SDPA now exists in Comfy by default. 👍 ❤️ :)

SDPA backend priority

9afc8ec

contentis requested a review from comfyanonymous as a code owner August 12, 2025 12:16

comfyanonymous merged commit 3da5a07 into comfyanonymous:master Aug 13, 2025
6 checks passed

contentis deleted the sdpa_kernel_selection branch August 14, 2025 08:42

zhangp365 pushed a commit to zhangp365/ComfyUI that referenced this pull request Aug 14, 2025

SDPA backend priority (comfyanonymous#9299)

bf7992d

Vander-Bilt pushed a commit to Vander-Bilt/ComfyUI that referenced this pull request Aug 26, 2025

SDPA backend priority (comfyanonymous#9299)

0043ba1

toxicwind pushed a commit to toxicwind/ComfyUI that referenced this pull request Oct 12, 2025

SDPA backend priority (comfyanonymous#9299)

ae3f5bc

adlerfaulkner pushed a commit to LucaLabsInc/ComfyUI that referenced this pull request Oct 16, 2025

SDPA backend priority (comfyanonymous#9299)

664e5d7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SDPA backend priority #9299

SDPA backend priority #9299

Uh oh!

contentis commented Aug 12, 2025

Uh oh!

Uh oh!

jurgenprins commented Aug 13, 2025

Uh oh!

contentis commented Aug 14, 2025

Uh oh!

jurgenprins commented Aug 14, 2025

Uh oh!

Askelhardd commented Aug 17, 2025

Uh oh!

contentis commented Aug 18, 2025

Uh oh!

Arcitec commented Aug 31, 2025 •

edited

Loading

Uh oh!

contentis commented Sep 1, 2025

Uh oh!

Arcitec commented Sep 1, 2025 •

edited

Loading

Uh oh!

contentis commented Sep 1, 2025

Uh oh!

Arcitec commented Sep 1, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

SDPA backend priority #9299

SDPA backend priority #9299

Uh oh!

Conversation

contentis commented Aug 12, 2025

Uh oh!

Uh oh!

jurgenprins commented Aug 13, 2025

Uh oh!

contentis commented Aug 14, 2025

Uh oh!

jurgenprins commented Aug 14, 2025

Uh oh!

Askelhardd commented Aug 17, 2025

Uh oh!

contentis commented Aug 18, 2025

Uh oh!

Arcitec commented Aug 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

contentis commented Sep 1, 2025

Uh oh!

Arcitec commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

contentis commented Sep 1, 2025

Uh oh!

Arcitec commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Arcitec commented Aug 31, 2025 •

edited

Loading

Arcitec commented Sep 1, 2025 •

edited

Loading

Arcitec commented Sep 1, 2025 •

edited

Loading