-
Notifications
You must be signed in to change notification settings - Fork 259
Add torch.compile support to SageAttention
#218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add torch.compile support to SageAttention
#218
Conversation
I'm not so sure about the tests that I added, if it is representative of the common usage of SageAttention
|
cc @StrongerXi |
|
@jt-zhang @jason-huang03 would it be possible to merge this patch? It's pretty harmless, most of the code is adding fake impls for the sage attention ops (you can think of them as registering shape functions for It actually fixes some accuracy issue which showed up in sage + compile: comfyanonymous/ComfyUI#8689 (comment). Also, diffusers is adding Sage Attention as a backend: huggingface/diffusers#12439, so merging this patch would benefit many users who often use sage + compile in diffusers and comfyui. |
|
Thank you for your pr. Please @whx1003 help to check and merge this pr. |
|
Hey folks! I am one of the maintainers of diffusers and can vouch for the merit of this PR as it benefits the This would be great to have merged. Also cc: @MekkCyber |
|
@guilhermeleobas Thanks for the PR! The code looks good to me — I ran tests on a RTX5090 and everything worked as expected. Could you please keep only the changes under the |
|
@whx1003 |
|
@sayakpaul I agree that the tests could be useful, but we don’t maintain test files in this repo at the moment. I’d prefer to leave them out of this PR and maybe revisit adding them later. |
|
No worries! @guilhermeleobas maybe we can have the test file as a gist under your profile and mention a link. |
|
Thanks for the feedback folks. @whx1003 I've removed the test file and unrelated changes. |
|
Thanks! |
|
nitpick: If I understand correctly, in the fake impl: SageAttention/sageattention/sm80_compile.py Line 121 in 15c0e22
it's better to set lse's device to query.device rather than "cuda" (the 0-th device), so it's consistent with the C code:
This is also true for sm89 and sm90. Maybe this helps solve the issue of set_device:SageAttention/sageattention/core.py Lines 250 to 256 in 15c0e22
but I haven't tested. |
|
Another thing noteworthy, although I haven't tested, is that we may need to set If I understand correctly, this should be absolutely needed, but I couldn't yet find an example that gives correct output with this and wrong output without this. In #74 (comment) , he succeeded by specifying |
I'm not 100% sure if the torch compile tests are representative of a common SageAttention usage. I wrote them based on the benchmark files.
Also, is there any lint rule that I should apply to the files?
Edit: I just saw there's a beta version of SageAttention 3 on HuggingFace. Is the code available? If so, I can also work on add torch.compile support on it.