Add `torch.compile` support to SageAttention #218

guilhermeleobas · 2025-07-25T19:24:48Z

I'm not 100% sure if the torch compile tests are representative of a common SageAttention usage. I wrote them based on the benchmark files.

Also, is there any lint rule that I should apply to the files?

Edit: I just saw there's a beta version of SageAttention 3 on HuggingFace. Is the code available? If so, I can also work on add torch.compile support on it.

I'm not so sure about the tests that I added, if it is representative of the common usage of SageAttention

guilhermeleobas · 2025-07-29T21:18:12Z

cc @StrongerXi

StrongerXi · 2025-10-06T18:41:04Z

@jt-zhang @jason-huang03 would it be possible to merge this patch? It's pretty harmless, most of the code is adding fake impls for the sage attention ops (you can think of them as registering shape functions for torch.compile).

It actually fixes some accuracy issue which showed up in sage + compile: comfyanonymous/ComfyUI#8689 (comment).

Also, diffusers is adding Sage Attention as a backend: huggingface/diffusers#12439, so merging this patch would benefit many users who often use sage + compile in diffusers and comfyui.

jt-zhang · 2025-10-06T19:05:12Z

Thank you for your pr. Please @whx1003 help to check and merge this pr.

sayakpaul · 2025-10-07T03:55:30Z

Hey folks!

I am one of the maintainers of diffusers and can vouch for the merit of this PR as it benefits the torch.compile users quite a bit.

This would be great to have merged. Also cc: @MekkCyber

whx1003 · 2025-10-07T11:46:54Z

@guilhermeleobas Thanks for the PR! The code looks good to me — I ran tests on a RTX5090 and everything worked as expected.

Could you please keep only the changes under the sageattention/ directory in this PR? The other files seem unrelated.

sayakpaul · 2025-10-07T11:53:09Z

@whx1003 tests/test_torch_compile.py could be beneficial for compilation tests. WDYT?

whx1003 · 2025-10-07T12:04:38Z

@sayakpaul I agree that the tests could be useful, but we don’t maintain test files in this repo at the moment.

I’d prefer to leave them out of this PR and maybe revisit adding them later.

sayakpaul · 2025-10-07T12:15:00Z

No worries! @guilhermeleobas maybe we can have the test file as a gist under your profile and mention a link.

guilhermeleobas · 2025-10-08T11:10:50Z

Thanks for the feedback folks. @whx1003 I've removed the test file and unrelated changes.

whx1003 · 2025-10-08T11:18:58Z

Thanks!

woct0rdho · 2025-10-10T10:00:07Z

nitpick: If I understand correctly, in the fake impl:

SageAttention/sageattention/sm80_compile.py

Line 121 in 15c0e22

    
           lse = torch.empty((batch_size, num_qo_heads, qo_len), dtype=torch.float32, device="cuda")

it's better to set lse's device to query.device rather than "cuda" (the 0-th device), so it's consistent with the C code:

SageAttention/csrc/qattn/qk_int_sv_f16_cuda_sm80.cu

Line 779 in 15c0e22

    
           lse = torch::empty({batch_size, num_qo_heads, qo_len}, query.options().dtype(torch::kFloat32));

This is also true for sm89 and sm90. Maybe this helps solve the issue of set_device:

SageAttention/sageattention/core.py

Lines 250 to 256 in 15c0e22

    
           # FIXME(DefTruth): make sage attention work compatible with distributed  
        
           # env, for example, xDiT which launch by torchrun. Without this workaround,  
        
           # sage attention will run into illegal memory access error after first  
        
           # inference step in distributed env for multi gpus inference. This small 
        
           # workaround also make sage attention work compatible with torch.compile 
        
           # through non-fullgraph compile mode. 
        
           torch.cuda.set_device(v.device)

but I haven't tested.

woct0rdho · 2025-10-14T05:15:14Z

Another thing noteworthy, although I haven't tested, is that we may need to set mutates_args={"output"}

If I understand correctly, this should be absolutely needed, but I couldn't yet find an example that gives correct output with this and wrong output without this.

In #74 (comment) , he succeeded by specifying q, k, v as mutable parameters, but I think that makes no sense...

guilhermeleobas added 2 commits July 25, 2025 03:07

[wip] add torch.compile

0116592

Add torch.compile support to SageAttention

d280867

I'm not so sure about the tests that I added, if it is representative of the common usage of SageAttention

guilhermeleobas mentioned this pull request Jul 25, 2025

Sage Attention 2 not work with Pytorch Compile #162

Open

StrongerXi mentioned this pull request Sep 8, 2025

slow WAN2.2 checkpoint load with TorchCompileModelWanVideoV2 kijai/ComfyUI-KJNodes#381

Open

Remove torch.compile related tests

32ed32f

whx1003 merged commit 15c0e22 into thu-ml:main Oct 8, 2025

kabachuha mentioned this pull request Oct 9, 2025

Torch compile support was added to Sage Attention kijai/ComfyUI-WanVideoWrapper#1381

Open

guilhermeleobas mentioned this pull request Oct 11, 2025

Set device to query.device to be consistent with C code #279

Merged

woct0rdho mentioned this pull request Oct 14, 2025

Compilation with full CUDA graphs (without breaks) #74

Open

Add torch.compile support to SageAttention #218

Add torch.compile support to SageAttention #218

Uh oh!

Conversation

guilhermeleobas commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

guilhermeleobas commented Jul 29, 2025

Uh oh!

StrongerXi commented Oct 6, 2025

Uh oh!

jt-zhang commented Oct 6, 2025

Uh oh!

sayakpaul commented Oct 7, 2025

Uh oh!

whx1003 commented Oct 7, 2025

Uh oh!

sayakpaul commented Oct 7, 2025

Uh oh!

whx1003 commented Oct 7, 2025

Uh oh!

sayakpaul commented Oct 7, 2025

Uh oh!

guilhermeleobas commented Oct 8, 2025

Uh oh!

whx1003 commented Oct 8, 2025

Uh oh!

woct0rdho commented Oct 10, 2025

Uh oh!

woct0rdho commented Oct 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Add `torch.compile` support to SageAttention #218

Add `torch.compile` support to SageAttention #218

guilhermeleobas commented Jul 25, 2025 •

edited

Loading