Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ASR] Conformer global tokens in local attention #6253

Merged
merged 20 commits into from
Apr 6, 2023

Conversation

sam1373
Copy link
Contributor

@sam1373 sam1373 commented Mar 19, 2023

What does this PR do ?

Adds Longformer-style attention for Conformer encoder (limited context + few tokens with full context attention). Unlike with just limited context attention, when using global tokens model needs to be fine-tuned with this approach to get good results. After fine-tuning results are improved particularly on long audio (even if it's much longer than training data). When using the default Fast Conformer configs with this attention, inference on audio above 1 hour is supported.

Collection: ASR

Changelog

  • Add option to use global tokens
  • Adds corresponding test
  • Configs for training Conformer with this attention and mention in docs

Usage

Add the following parameters to the Conformer encoder in config:

    self_attention_model: rel_pos_local_attn # longformer-style attention (sliding window + global tokens)
    global_tokens: 1 # number of tokens that attend and are attended to by all tokens
    global_attn_separate: false # whether global tokens should use separate q,k,v layers

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

Signed-off-by: sam1373 <[email protected]>
Signed-off-by: sam1373 <[email protected]>
@github-actions github-actions bot added the ASR label Mar 19, 2023
@sam1373 sam1373 marked this pull request as ready for review March 20, 2023 18:31
Copy link
Collaborator

@titu1994 titu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks good, some minor changes commented. @VahidooX for final review


# This version uses Longformer-style attention in order to handle longer audio

name: "FastConformer-CTC-BPE"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add long-context


# Feed forward module's params
ff_expansion_factor: 4

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a Note regarding the sections that are the primary difference compared to original model config


# This version uses Longformer-style attention in order to handle longer audio

name: "FastConformer-Transducer-BPE"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name update

# Feed forward module's params
ff_expansion_factor: 4

# Multi-headed Attention Module's params
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

if self.global_tokens > 0:
# compute sum of global and local attn
x = self._compute_attn_output_with_global_indices(
value_vectors=v.transpose(1, 2),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add shape info as comment here

max_num_global_attn_indices=max_num_global_attn_indices,
is_index_global_attn_nonzero=is_index_global_attn_nonzero,
is_local_index_global_attn_nonzero=is_local_index_global_attn_nonzero,
w=w,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use better keyword variable name

return self.linear_out(x.reshape(n_batch, -1, self.h * self.d_k)[:, :T])

@staticmethod
def _get_global_attn_indices(is_index_global_attn: torch.Tensor) -> Tuple:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about make this protected instead of private? Put _ at the end of the method name

is_local_index_no_global_attn_nonzero,
)

def _compute_attn_probs_global_key(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above


return attn_output_only_global + attn_output_without_global

def _compute_global_attn_output_from_hidden(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

Copy link
Collaborator

@VahidooX VahidooX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Left some minor comments.

docs/source/asr/models.rst Outdated Show resolved Hide resolved
docs/source/asr/models.rst Show resolved Hide resolved
@@ -0,0 +1,203 @@
# It contains the default values for training a Fast Conformer-CTC ASR model, large size (~120M) with CTC loss and sub-word encoding.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it nice to have "-" and "_" in the name at the same time?

nemo/collections/asr/modules/conformer_encoder.py Outdated Show resolved Hide resolved
@sam1373 sam1373 requested a review from VahidooX April 5, 2023 20:19
VahidooX
VahidooX previously approved these changes Apr 5, 2023
Copy link
Collaborator

@VahidooX VahidooX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just left some minor comments. Let's wait for the second round of review from @titu1994.

Signed-off-by: sam1373 <[email protected]>
Copy link
Collaborator

@VahidooX VahidooX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@sam1373 sam1373 requested a review from titu1994 April 6, 2023 01:57
@sam1373 sam1373 merged commit 65ae83d into NVIDIA:main Apr 6, 2023
hsiehjackson pushed a commit to hsiehjackson/NeMo that referenced this pull request Jun 2, 2023
* global tokens

Signed-off-by: sam1373 <[email protected]>

* test, configs, docs

Signed-off-by: sam1373 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

Signed-off-by: sam1373 <[email protected]>

* style

Signed-off-by: sam1373 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* names, comments

Signed-off-by: sam1373 <[email protected]>

* move comment

Signed-off-by: sam1373 <[email protected]>

* import

Signed-off-by: sam1373 <[email protected]>

* docs

Signed-off-by: sam1373 <[email protected]>

* docs

Signed-off-by: sam1373 <[email protected]>

* disable note

Signed-off-by: sam1373 <[email protected]>

---------

Signed-off-by: sam1373 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: hsiehjackson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants