-
Notifications
You must be signed in to change notification settings - Fork 26
Use custom SDPA for decoder-only HF Transformers #46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Will gate the access to custom_sdpa in 0.4.0 release. |
|
@larryliu0820 Updated: The issue is because the eager was using |
8e9e3c2 to
1b2cb42
Compare
|
Transformers version bump has been merged in #47 |
1b2cb42 to
cadd829
Compare
|
cc: @larryliu0820 @kimishpatel for review |
|
Can you change this: https://github.com/huggingface/optimum-executorch/blob/main/optimum/executorch/modeling.py#L181 to use the new Python API: https://pytorch.org/executorch/stable/index.html |
@larryliu0820 Yeah, I'm going to do it in a separate PR. |
754dd57 to
e8f5263
Compare
e8f5263 to
eb2c840
Compare
|
Rebased and fixed conflicts |
|
@larryliu0820 @kimishpatel good to merge? |
|
Support export with custom_sdpa using |
14a6bbd to
aab448f
Compare
4.51.0) in order to use theAttentionInterface. This has been addressed in Bump Transformers verion #47optimum-cli export executorchsupports custom SDPA3x speedup using custom SDPA for HF smollm2 (XNNPACK fp32):

General applicable to all causal LMs. For encoder-decoder models, it may apply to the self attention layer in the decoder, can make an experiment in a follow-up PR.