Skip to content

Conversation

@guangy10
Copy link
Collaborator

@guangy10 guangy10 commented Apr 8, 2025

  • Register a custom SDPA attention to HF Transformers
  • Use the custom SDPA for decoder-only text models (custom SDPA is optimized for ExecuTorch)
  • Have to bump up transformers version to latest release (4.51.0) in order to use the AttentionInterface. This has been addressed in Bump Transformers verion #47
  • optimum-cli export executorch supports custom SDPA

3x speedup using custom SDPA for HF smollm2 (XNNPACK fp32):
Screenshot 2025-04-08 at 4 05 10 PM

General applicable to all causal LMs. For encoder-decoder models, it may apply to the self attention layer in the decoder, can make an experiment in a follow-up PR.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@guangy10
Copy link
Collaborator Author

guangy10 commented Apr 8, 2025

Will gate the access to custom_sdpa in 0.4.0 release.

@guangy10
Copy link
Collaborator Author

guangy10 commented Apr 8, 2025

@larryliu0820 FYI, I'm running into the same the issue when using the custom sdpa with eager. The exported model is fine. https://github.com/huggingface/optimum-executorch/actions/runs/14324546808/job/40147624383?pr=46#step:5:137. Checked the shape of the attn output, and they are same. Will dig further by inspecting if output tensors are close tomorrow.

Updated: The issue is because the eager was using DynamicCache by default, and our custom SDPA can ONLY work with the StaticCache. After setting the cache_implementation correctly, both the eager and the ExecuTorch model can work as expected.

@guangy10
Copy link
Collaborator Author

guangy10 commented Apr 8, 2025

Transformers version bump has been merged in #47

@guangy10 guangy10 marked this pull request as ready for review April 8, 2025 20:42
@guangy10
Copy link
Collaborator Author

guangy10 commented Apr 8, 2025

cc: @larryliu0820 @kimishpatel for review

@larryliu0820
Copy link
Collaborator

@guangy10
Copy link
Collaborator Author

guangy10 commented Apr 9, 2025

@guangy10 guangy10 force-pushed the custom_attn_impl branch 4 times, most recently from 754dd57 to e8f5263 Compare April 10, 2025 18:14
@guangy10
Copy link
Collaborator Author

Rebased and fixed conflicts

@guangy10 guangy10 changed the title Use custom sdpa for ExecuTorch Use custom SDPA for decoder-only HF Transformers Apr 10, 2025
@guangy10
Copy link
Collaborator Author

@larryliu0820 @kimishpatel good to merge?

@guangy10
Copy link
Collaborator Author

Support export with custom_sdpa using optimum-cli export executorch

@guangy10 guangy10 merged commit 2901511 into huggingface:main Apr 11, 2025
218 checks passed
@guangy10 guangy10 deleted the custom_attn_impl branch April 11, 2025 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants