Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why Torchchat uses MATH as SDPA backend? #1452

Open
yanbing-j opened this issue Jan 8, 2025 · 7 comments
Open

Why Torchchat uses MATH as SDPA backend? #1452

yanbing-j opened this issue Jan 8, 2025 · 7 comments
Labels
enhancement New feature or request triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@yanbing-j
Copy link
Contributor

🐛 Describe the bug

Hi maintainers,

I find that, Torchchat uses MATH as SDPA backend in https://github.com/pytorch/torchchat/blob/main/torchchat/generate.py#L542. However, for other libs like vllm, they all accept flash attention as default backend.

So why Torchchat uses MATH as a default backend? Is this required for accuracy? If not, I can help to add an argument to let user set the backend. Thanks!

Versions

@lucylq
Copy link
Contributor

lucylq commented Jan 10, 2025

I can help to add an argument to let user set the backend.

This seems like a good idea! cc @Jack-Khuu if there's any history behind using MATH as default backend?

@lucylq
Copy link
Contributor

lucylq commented Jan 11, 2025

Hmn, actually, it seems like there's an issue exporting when the default backend is not MATH.

See issue: pytorch/pytorch#129418

It seems like there's a requirement that decompositions during export must not introduce any mutation ops. SDPBackend.MATH is known to work well with export; with other backends, we may run into issues. cc @angelayi

@lucylq lucylq added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module enhancement New feature or request labels Jan 11, 2025
@yanbing-j
Copy link
Contributor Author

cc @mingfeima

@mingfeima
Copy link

Hmn, actually, it seems like there's an issue exporting when the default backend is not MATH.

See issue: pytorch/pytorch#129418

It seems like there's a requirement that decompositions during export must not introduce any mutation ops. SDPBackend.MATH is known to work well with export; with other backends, we may run into issues. cc @angelayi

I think we may add an argument to let user to decide which backend to use, say "--attention-backend"; so when users are not depending on export can have better performance with flash attention.

@yanbing-j
Copy link
Contributor Author

I draft a PR #1456 to add an argument attention_backend. The default value of this argument is math. Please take a look, thanks!

@mikekgfb
Copy link
Contributor

Replacing MATH with FLASH_ATTENTION in https://github.com/pytorch/torchchat/blob/main/torchchat/export.py#L70 worked fine for me. (On a Mac, AOTI/DSO exporting to CPU.)

I'm actually not sure that the backend specification in https://github.com/pytorch/torchchat/blob/main/torchchat/export.py#L317 matters for ET export, because ET replaces all attention modules at https://github.com/pytorch/torchchat/blob/main/torchchat/export.py#L314 with the ET optimized CustomSDPAAttention (although I don't know if that in turn might call F.SDPA somewhere deep in the definition such that MATH vs FLASH_ATTENTION would matter?)

@mingfeima
Copy link

Oh I see. ET will replace it with CustomSDPAAttention, which goes into sdpa_with_kv_cache

from executorch.extension.llm.custom_ops import sdpa_with_kv_cache # no-qa

In that case, I think ET will overwrite the algo you defined from torchchat, as you can see algo is not passed down here
https://github.com/pytorch/torchchat/blob/main/torchchat/export.py#L213-L221

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

4 participants