[Cherry-pick] Use scaled_dot_product_attention in WavLM attention (#3252, #3265) #3264

nateanl · 2023-04-11T01:44:06Z

Summary:
Fix #3219.

torch.nn.MultiheadAttention will throw an error if torch.no_grad() and mask are both given. The pull request fixes it by replacing the forward method with torch.nn.functional.scaled_dot_product_attention.

Pull Request resolved: #3252

Reviewed By: mthrok

Differential Revision: D44798634

Pulled By: nateanl

fbshipit-source-id: abfa7fb84b7bd71848a92ab26da5a5f0f095c665

Summary: Fix pytorch#3219. `torch.nn.MultiheadAttention` will throw an error if `torch.no_grad()` and mask are both given. The pull request fixes it by replacing the forward method with `torch.nn.functional.scaled_dot_product_attention`. Pull Request resolved: pytorch#3252 Reviewed By: mthrok Differential Revision: D44798634 Pulled By: nateanl fbshipit-source-id: abfa7fb84b7bd71848a92ab26da5a5f0f095c665

Summary: When `key_padding_mask` is not `None`, it needs to be combined with `attn_mask_rel_pos` as one mask for `scaled_dot_product_attention` function. Pull Request resolved: pytorch#3265 Reviewed By: hwangjeff Differential Revision: D44901093 Pulled By: nateanl fbshipit-source-id: 73ca7af48faf7f4eb36b35b603187a11e5582c70

facebook-github-bot added the CLA Signed label Apr 11, 2023

nateanl requested a review from a team April 11, 2023 01:44

nateanl mentioned this pull request Apr 11, 2023

[v2.0.1] Release Tracker #3237

Closed

nateanl force-pushed the cherrypick-wav2vec2 branch from ce8cef6 to 614cdfe Compare April 12, 2023 12:18

nateanl added 2 commits April 12, 2023 08:23

nateanl force-pushed the cherrypick-wav2vec2 branch from 46b7195 to 15c0b62 Compare April 12, 2023 12:23

nateanl changed the title ~~[Cherry-pick] Use scaled_dot_product_attention in WavLM attention (#3252)~~ [Cherry-pick] Use scaled_dot_product_attention in WavLM attention (#3252, #3265) Apr 12, 2023

nateanl merged commit 54f6c1f into pytorch:release/2.0 Apr 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Cherry-pick] Use scaled_dot_product_attention in WavLM attention (#3252, #3265) #3264

[Cherry-pick] Use scaled_dot_product_attention in WavLM attention (#3252, #3265) #3264

nateanl commented Apr 11, 2023

[Cherry-pick] Use scaled_dot_product_attention in WavLM attention (#3252, #3265) #3264

[Cherry-pick] Use scaled_dot_product_attention in WavLM attention (#3252, #3265) #3264

Conversation

nateanl commented Apr 11, 2023