-
Notifications
You must be signed in to change notification settings - Fork 661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use scaled_dot_product_attention in WavLM attention #3252
Conversation
Here are the benchmark results with new changes. The benchmark script can be found in https://gist.github.com/nateanl/97b2f9adb39c05a4e854fbd924de01f6.
|
@nateanl has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Here is the script which shows the new |
Hey @nateanl. |
Summary: Fix pytorch#3219. `torch.nn.MultiheadAttention` will throw an error if `torch.no_grad()` and mask are both given. The pull request fixes it by replacing the forward method with `torch.nn.functional.scaled_dot_product_attention`. Pull Request resolved: pytorch#3252 Reviewed By: mthrok Differential Revision: D44798634 Pulled By: nateanl fbshipit-source-id: abfa7fb84b7bd71848a92ab26da5a5f0f095c665
Summary: Fix pytorch#3219. `torch.nn.MultiheadAttention` will throw an error if `torch.no_grad()` and mask are both given. The pull request fixes it by replacing the forward method with `torch.nn.functional.scaled_dot_product_attention`. Pull Request resolved: pytorch#3252 Reviewed By: mthrok Differential Revision: D44798634 Pulled By: nateanl fbshipit-source-id: abfa7fb84b7bd71848a92ab26da5a5f0f095c665
, #3265) (#3264) * Use scaled_dot_product_attention in WavLM attention (#3252) Summary: Fix #3219. `torch.nn.MultiheadAttention` will throw an error if `torch.no_grad()` and mask are both given. The pull request fixes it by replacing the forward method with `torch.nn.functional.scaled_dot_product_attention`. Pull Request resolved: #3252 Reviewed By: mthrok Differential Revision: D44798634 Pulled By: nateanl fbshipit-source-id: abfa7fb84b7bd71848a92ab26da5a5f0f095c665 * Merge key_padding_mask into attn_mask_rel_pos in WavLM (#3265) Summary: When `key_padding_mask` is not `None`, it needs to be combined with `attn_mask_rel_pos` as one mask for `scaled_dot_product_attention` function. Pull Request resolved: #3265 Reviewed By: hwangjeff Differential Revision: D44901093 Pulled By: nateanl fbshipit-source-id: 73ca7af48faf7f4eb36b35b603187a11e5582c70
Fix #3219.
torch.nn.MultiheadAttention
will throw an error iftorch.no_grad()
and mask are both given. The pull request fixes it by replacing the forward method withtorch.nn.functional.scaled_dot_product_attention
.