InterCTC loss and stochastic depth implementation #6013

Kipok · 2023-02-14T01:47:55Z

What does this PR do ?

Adding intermediate CTC loss and stochastic depth as described in https://arxiv.org/abs/2102.03216.

The current implementation is only for conformer encoder, but I'm not really sure how to write a generic code for this case. Please let me know if you have some ideas here.

Collection: ASR

Changelog

Add specific line by line info of high level changes in this PR.

Usage

You can potentially add a usage example below

To use, specify parameters in the config. E.g., for stochastic depth:

model.encoder.stochastic_depth_mode=linear  
model.encoder.stochastic_depth_drop_prob=0.3 
model.encoder.stochastic_depth_start_layer=0

For intermediate CTC loss:

model.encoder.capture_output_at_layers=[9] 
model.intermediate_loss_weights=[0.3]

I've added the docs to ConformerEncoder, but not sure how to add the docs to the new intermediate_loss_weights parameter of the EncDecCTCModel. Please let me know what's the right place to put those docs in.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

nemo/collections/asr/modules/conformer_encoder.py

tests/collections/asr/test_conformer_encoder.py

tests/collections/asr/test_asr_ctcencdec_model.py

tests/collections/asr/test_conformer_encoder.py

titu1994

The idea is good, but the code is becoming complicated.
We have two options -

Keep it inside ctc_model.py - but have mixin class that deals with interctc parts of the code including any and all functions needed by it. Then the ctc_model.py class simply calls these functions.
Write entirely separate class for ctc_models.py - subclass it and override the parts we need - which is primarily the forward, parts of training step and validation_step.

A bad option is to merge it as is right now - would significantly complicate the ctc_models.py training step and validation step plus cause issues with long audio inference where we don't want interctc tensor to waste memory.

@VahidooX What is your preference ?

titu1994 · 2023-02-14T02:03:38Z

nemo/collections/asr/models/ctc_models.py

@@ -524,6 +536,23 @@ def forward(
        encoded_len = encoder_output[1]
        log_probs = self.decoder(encoder_output=encoded)
        greedy_predictions = log_probs.argmax(dim=-1, keepdim=False)
+        # generating decoding results for intermediate layers if necessary


Add new line after greedy preds, plus some docstring explaining which paper this is implementing and what it does.

titu1994 · 2023-02-14T02:04:02Z

nemo/collections/asr/models/ctc_models.py

@@ -524,6 +536,23 @@ def forward(
        encoded_len = encoder_output[1]
        log_probs = self.decoder(encoder_output=encoded)
        greedy_predictions = log_probs.argmax(dim=-1, keepdim=False)
+        # generating decoding results for intermediate layers if necessary
+        if self.intermediate_loss_weights:


Be explicit and check len(self.intermediate_loss_weights) > 0.0

titu1994 · 2023-02-14T02:05:43Z

nemo/collections/asr/models/ctc_models.py

+        if self.intermediate_loss_weights:
+            # we assume that encoder has to have property called "captured_layer_outputs"
+            # which is a list with the same length as loss weights
+            if len(self.encoder.captured_layer_outputs) != len(self.intermediate_loss_weights):


Dont assume. First and foremost you should not be attaching tensors with grads to a module - ever. You can instead use the Tensor Registry framework in NeMo. Most models already support it due to other parts needing it. So do the following - In the training_step, enable the tensor registry, let modules register their forward activation (or a subset of it) and then access the registry here. Empty the registry at end of the train step to prevent memory leak.

So do a hasattr check, then raise a proper error (Value/Runtime) which details

Provided CTC intermediate loss weights list is not empty

But the model does not add anything to the registry.

Also, this will waste memory during inference if the user does not need interctc outputs unless explicitly requested. So put a self.training check above for this part.

titu1994 · 2023-02-14T02:13:01Z

nemo/collections/asr/models/ctc_models.py

+                )
+            self.intermediate_decoding_results = [None] * len(self.encoder.captured_layer_outputs)
+            for idx, captured_output in enumerate(self.encoder.captured_layer_outputs):
+                self.intermediate_decoding_results[idx] = [


Do not set values - especially cuda tensors - to a module during training. They will not be garbage collected properly.

titu1994 · 2023-02-14T02:14:39Z

nemo/collections/asr/models/ctc_models.py

+                target_lengths=transcript_len,
+                input_lengths=intermediate_result[1],
+            )
+            tensorboard_logs[f"inter_ctc_loss{idx}"] = loss_value.detach()


This should be inter_loss_value

titu1994 · 2023-02-14T02:17:36Z

nemo/collections/asr/modules/conformer_encoder.py

+            raise ValueError('stochastic_depth_mode has to be one of ["linear", "uniform"].')
+        self.layer_drop_probs = layer_drop_probs
+        self.capture_output_at_layers = capture_output_at_layers
+        if self.capture_output_at_layers is None:


Dont cache tensors inside a module during the forward pass. use the tensor registry.

titu1994 · 2023-02-14T02:18:37Z

nemo/collections/asr/modules/conformer_encoder.py

@@ -478,6 +523,17 @@ def forward(self, audio_signal, length, cache_last_channel=None, cache_last_time
                cache_last_channel_next=cache_last_channel_next,
                cache_last_time_next=cache_last_time_next,
            )
+            if self.training:


Add docstring here for stochastic depth explanaing whats being done here.
Also add the condition right here that stochastic_depth_drop_prob > 0.0 after self.training

titu1994 · 2023-02-14T02:19:25Z

nemo/collections/asr/modules/conformer_encoder.py

+                if should_drop:
+                    # that's not efficient, but it's hard to implement distributed
+                    # version of dropping layers without deadlock or random seed meddling
+                    # so multiplying the signal by 0 to ensure all weights get gradients


This is fine.

titu1994 · 2023-02-14T02:21:28Z

nemo/collections/asr/modules/conformer_encoder.py

@@ -487,6 +543,9 @@ def forward(self, audio_signal, length, cache_last_channel=None, cache_last_time
                _, pos_emb = self.pos_enc(x=audio_signal, cache_len=cache_len)
                pad_mask, att_mask = self._create_masks(max_audio_length, length, audio_signal.device)

+            if lth in self.capture_output_at_layers:


Use tensor registry here.

titu1994 · 2023-02-14T02:21:57Z

nemo/collections/asr/modules/conformer_encoder.py

@@ -496,6 +555,9 @@ def forward(self, audio_signal, length, cache_last_channel=None, cache_last_time

        audio_signal = torch.transpose(audio_signal, 1, 2)

+        for captured_output in self.captured_layer_outputs:


do above when you use the tensor registry.

nemo/collections/asr/models/ctc_models.py

titu1994

Looks a lot better, minor comments and docstring additions to InterCTCMixin class, then can be merged.

nemo/collections/asr/models/ctc_models.py

titu1994 · 2023-02-16T07:36:12Z

nemo/collections/asr/models/ctc_models.py

@@ -536,6 +540,9 @@ def training_step(self, batch, batch_nb):
        if AccessMixin.is_access_enabled():
            AccessMixin.reset_registry(self)

+        if self.interctc_enabled:


Should be a function in InterCTCMixin, not a variable assigned to self.

nemo/collections/asr/models/ctc_models.py

titu1994 · 2023-02-16T07:44:03Z

nemo/collections/asr/parts/mixins/interctc_mixin.py

+        if len(self.encoder.capture_output_at_layers) != len(self.intermediate_loss_weights):
+            raise ValueError('Length of encoder.capture_output_at_layers has to match intermediate_loss_weights')
+
+    def finalize_interctc_metrics(self, metrics, outputs, prefix):


Methods need docstrings

titu1994 · 2023-02-16T07:44:47Z

nemo/collections/asr/parts/mixins/interctc_mixin.py

+                [x[f"{prefix}final_ctc_loss"] for x in outputs]
+            ).mean()
+
+    def get_captured_tensors(self):


Lets have the keyword interctc somewhere in the name of all methods to avoid name collision. Or make them private if wont be used outside of this class.

titu1994 · 2023-02-16T07:45:11Z

nemo/collections/asr/parts/mixins/interctc_mixin.py

+        # if intermediate_loss_weights was set, the encoder has to register
+        # layer_output_X and layer_length_X tensors. We need to apply decoder
+        # to each of them and compute CTC loss.
+        module_registry = AccessMixin.get_module_registry(self.encoder)['']  # key for encoder


That's the key that AccessMixin has when assigning tensors to the current module. I don't have control over it :)

nemo/collections/asr/parts/mixins/interctc_mixin.py

Kipok · 2023-02-16T22:52:53Z

docs/source/asr/configs.rst

@@ -546,6 +547,38 @@ The encoder section includes the details about the RNN-based encoder architectur
 config files and also :ref:`nemo.collections.asr.modules.RNNEncoder <rnn-encoder-api>`.


+CTC Configurations


@titu1994 not sure if that's the best place to put these docs

It is for the configuration of ASR models, doesn't look like that bad to me.

Same, looks fine

VahidooX

LGTM, just minor comments!

nemo/collections/asr/modules/conformer_encoder.py

nemo/collections/asr/models/ctc_models.py

examples/asr/conf/conformer/conformer_ctc_bpe.yaml

VahidooX · 2023-02-16T23:34:05Z

docs/source/asr/configs.rst

@@ -546,6 +547,38 @@ The encoder section includes the details about the RNN-based encoder architectur
 config files and also :ref:`nemo.collections.asr.modules.RNNEncoder <rnn-encoder-api>`.


+CTC Configurations


It is for the configuration of ASR models, doesn't look like that bad to me.

VahidooX · 2023-02-16T23:35:37Z

docs/source/asr/configs.rst

+CTC Configurations
+------------------
+
+All CTC-based models also support `InterCTC loss <https://arxiv.org/abs/2102.03216>`_. To use it, you need to specify


How about the stochastic depth docs?

What's the best place to put them in? They are currently in the code for ConformerEncoder and only supported there (but for any model that's using it)

The same place looks good to me. You may mention in the descriptions that it is just supported for Conformer-based models.

But this one is called "CTC Configurations", while stochastic depth is for both CTC and transducer (although only for conformer-based versions now)

I meant creating a new section in the same file titled "Stochastic Depth" or something like this.
You may also rename this section to "InterCTC Loss" instead of "CTC Configurations"?

Agreed. Rename to InterCTC Config and add new config Stochastic Depth. Note that conformer is only supported now. We can add more models in the future

docs/source/asr/configs.rst

titu1994

Minor comments. The PR is ready to merge after that

titu1994 · 2023-02-17T04:20:53Z

docs/source/asr/configs.rst

@@ -546,6 +547,38 @@ The encoder section includes the details about the RNN-based encoder architectur
 config files and also :ref:`nemo.collections.asr.modules.RNNEncoder <rnn-encoder-api>`.


+CTC Configurations


Same, looks fine

titu1994 · 2023-02-17T04:22:14Z

docs/source/asr/configs.rst

+CTC Configurations
+------------------
+
+All CTC-based models also support `InterCTC loss <https://arxiv.org/abs/2102.03216>`_. To use it, you need to specify


Agreed. Rename to InterCTC Config and add new config Stochastic Depth. Note that conformer is only supported now. We can add more models in the future

docs/source/asr/configs.rst

nemo/collections/asr/modules/conformer_encoder.py

nemo/collections/asr/modules/squeezeformer_encoder.py

nemo/collections/asr/parts/mixins/interctc_mixin.py

nemo/collections/asr/parts/submodules/jasper.py

titu1994

After review, seems I was mistaken. All cases are covered correctly.
LGTM, ready to merge

Signed-off-by: Igor Gitman <[email protected]>

* Some simplifications Signed-off-by: Igor Gitman <[email protected]> * Add tests for stochastic depth Signed-off-by: Igor Gitman <[email protected]> * Fix tests for stochastic depth Signed-off-by: Igor Gitman <[email protected]> * Add interctc loss and logs Signed-off-by: Igor Gitman <[email protected]> * Fix a few issues Signed-off-by: Igor Gitman <[email protected]> * Add interctc loss tests Signed-off-by: Igor Gitman <[email protected]> * Add docs Signed-off-by: Igor Gitman <[email protected]> * Add training_step test for interctc Signed-off-by: Igor Gitman <[email protected]> * Refactoring with AccessMixin WIP Signed-off-by: Igor Gitman <[email protected]> * Separate interctc logic into a mixin Signed-off-by: Igor Gitman <[email protected]> * Fix tests Signed-off-by: Igor Gitman <[email protected]> * Fix some lint errors Signed-off-by: Igor Gitman <[email protected]> * Small refactoring Signed-off-by: Igor Gitman <[email protected]> * Add more docs, fix PR comments Signed-off-by: Igor Gitman <[email protected]> * Add other encoder support + more refactoring Signed-off-by: Igor Gitman <[email protected]> * Add more config examples Signed-off-by: Igor Gitman <[email protected]> * Move stochastic depth setup to utils Signed-off-by: Igor Gitman <[email protected]> * Add interctc_enabled setter + more docs Signed-off-by: Igor Gitman <[email protected]> * Fix a few doc strings for better web display Signed-off-by: Igor Gitman <[email protected]> * Update CTC flow diagram Signed-off-by: Igor Gitman <[email protected]> --------- Signed-off-by: Igor Gitman <[email protected]>

* Some simplifications Signed-off-by: Igor Gitman <[email protected]> * Add tests for stochastic depth Signed-off-by: Igor Gitman <[email protected]> * Fix tests for stochastic depth Signed-off-by: Igor Gitman <[email protected]> * Add interctc loss and logs Signed-off-by: Igor Gitman <[email protected]> * Fix a few issues Signed-off-by: Igor Gitman <[email protected]> * Add interctc loss tests Signed-off-by: Igor Gitman <[email protected]> * Add docs Signed-off-by: Igor Gitman <[email protected]> * Add training_step test for interctc Signed-off-by: Igor Gitman <[email protected]> * Refactoring with AccessMixin WIP Signed-off-by: Igor Gitman <[email protected]> * Separate interctc logic into a mixin Signed-off-by: Igor Gitman <[email protected]> * Fix tests Signed-off-by: Igor Gitman <[email protected]> * Fix some lint errors Signed-off-by: Igor Gitman <[email protected]> * Small refactoring Signed-off-by: Igor Gitman <[email protected]> * Add more docs, fix PR comments Signed-off-by: Igor Gitman <[email protected]> * Add other encoder support + more refactoring Signed-off-by: Igor Gitman <[email protected]> * Add more config examples Signed-off-by: Igor Gitman <[email protected]> * Move stochastic depth setup to utils Signed-off-by: Igor Gitman <[email protected]> * Add interctc_enabled setter + more docs Signed-off-by: Igor Gitman <[email protected]> * Fix a few doc strings for better web display Signed-off-by: Igor Gitman <[email protected]> * Update CTC flow diagram Signed-off-by: Igor Gitman <[email protected]> --------- Signed-off-by: Igor Gitman <[email protected]> Signed-off-by: hsiehjackson <[email protected]>

Kipok requested review from titu1994 and VahidooX February 14, 2023 01:47

github-actions bot added the ASR label Feb 14, 2023

github-advanced-security bot found potential problems Feb 14, 2023

View reviewed changes

titu1994 requested changes Feb 14, 2023

View reviewed changes

github-advanced-security bot found potential problems Feb 16, 2023

View reviewed changes

nemo/collections/asr/models/ctc_models.py Fixed Show resolved Hide resolved

titu1994 reviewed Feb 16, 2023

View reviewed changes

Kipok force-pushed the igitman/interctc-sd branch from 378c8b3 to a828d6c Compare February 16, 2023 22:51

Kipok commented Feb 16, 2023

View reviewed changes

VahidooX requested changes Feb 16, 2023

View reviewed changes

Kipok commented Feb 16, 2023

View reviewed changes

docs/source/asr/configs.rst Show resolved Hide resolved

Kipok requested review from titu1994 and VahidooX February 17, 2023 00:17

titu1994 reviewed Feb 17, 2023

View reviewed changes

Kipok requested a review from titu1994 February 17, 2023 18:02

titu1994 previously approved these changes Feb 17, 2023

View reviewed changes

Kipok added 14 commits February 17, 2023 21:10

Some simplifications

86c867b

Signed-off-by: Igor Gitman <[email protected]>

Add tests for stochastic depth

d6d1a2a

Signed-off-by: Igor Gitman <[email protected]>

Fix tests for stochastic depth

b6dcba2

Signed-off-by: Igor Gitman <[email protected]>

Add interctc loss and logs

67705cc

Signed-off-by: Igor Gitman <[email protected]>

Fix a few issues

929ab4b

Signed-off-by: Igor Gitman <[email protected]>

Add interctc loss tests

bc7f479

Signed-off-by: Igor Gitman <[email protected]>

Add docs

1ace22c

Signed-off-by: Igor Gitman <[email protected]>

Add training_step test for interctc

e989580

Signed-off-by: Igor Gitman <[email protected]>

Refactoring with AccessMixin WIP

82d5d0a

Signed-off-by: Igor Gitman <[email protected]>

Separate interctc logic into a mixin

8c357c7

Signed-off-by: Igor Gitman <[email protected]>

Fix tests

4ac1075

Signed-off-by: Igor Gitman <[email protected]>

Fix some lint errors

20bb404

Signed-off-by: Igor Gitman <[email protected]>

Small refactoring

ff91439

Signed-off-by: Igor Gitman <[email protected]>

Add more docs, fix PR comments

d4389cf

Signed-off-by: Igor Gitman <[email protected]>

Kipok added 6 commits February 17, 2023 21:10

Add other encoder support + more refactoring

95778c6

Signed-off-by: Igor Gitman <[email protected]>

Add more config examples

02aa226

Signed-off-by: Igor Gitman <[email protected]>

Move stochastic depth setup to utils

c1b3ca2

Signed-off-by: Igor Gitman <[email protected]>

Add interctc_enabled setter + more docs

5a294df

Signed-off-by: Igor Gitman <[email protected]>

Fix a few doc strings for better web display

aea9d06

Signed-off-by: Igor Gitman <[email protected]>

Update CTC flow diagram

5b27a82

Signed-off-by: Igor Gitman <[email protected]>

Kipok dismissed titu1994’s stale review via 5b27a82 February 17, 2023 21:10

Kipok force-pushed the igitman/interctc-sd branch from 3d92c70 to 5b27a82 Compare February 17, 2023 21:10

titu1994 approved these changes Feb 17, 2023

View reviewed changes

VahidooX approved these changes Feb 17, 2023

View reviewed changes

Kipok merged commit 83859ec into NVIDIA:main Feb 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

InterCTC loss and stochastic depth implementation #6013

InterCTC loss and stochastic depth implementation #6013

Kipok commented Feb 14, 2023

titu1994 left a comment

titu1994 Feb 14, 2023

titu1994 Feb 14, 2023

titu1994 Feb 14, 2023

titu1994 Feb 14, 2023

titu1994 Feb 14, 2023

titu1994 Feb 14, 2023

titu1994 Feb 16, 2023

titu1994 Feb 14, 2023

titu1994 Feb 14, 2023

titu1994 Feb 14, 2023

titu1994 Feb 14, 2023

titu1994 left a comment

titu1994 Feb 16, 2023

titu1994 Feb 16, 2023

titu1994 Feb 16, 2023

titu1994 Feb 16, 2023

Kipok Feb 16, 2023

Kipok Feb 16, 2023

VahidooX Feb 16, 2023

titu1994 Feb 17, 2023

VahidooX left a comment

VahidooX Feb 16, 2023

VahidooX Feb 16, 2023

Kipok Feb 16, 2023

VahidooX Feb 16, 2023

Kipok Feb 17, 2023

VahidooX Feb 17, 2023

titu1994 Feb 17, 2023

titu1994 left a comment

titu1994 Feb 17, 2023

titu1994 Feb 17, 2023

titu1994 left a comment

		@@ -496,6 +555,9 @@ def forward(self, audio_signal, length, cache_last_channel=None, cache_last_time

		audio_signal = torch.transpose(audio_signal, 1, 2)

		for captured_output in self.captured_layer_outputs:

		@@ -546,6 +547,38 @@ The encoder section includes the details about the RNN-based encoder architectur
		config files and also :ref:`nemo.collections.asr.modules.RNNEncoder <rnn-encoder-api>`.


		CTC Configurations

InterCTC loss and stochastic depth implementation #6013

InterCTC loss and stochastic depth implementation #6013

Conversation

Kipok commented Feb 14, 2023

What does this PR do ?

Changelog

Usage

Before your PR is "Ready for review"

Who can review?

Additional Information

titu1994 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

titu1994 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

VahidooX left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

titu1994 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

titu1994 left a comment

Choose a reason for hiding this comment