Adding interCTC loss to hybrid models #6215

Kipok · 2023-03-15T23:27:38Z

What does this PR do ?

Adding interCTC loss to hybrid models. To solve issue that ctc and hybrid models use different names for things that we need (e.g., wer vs ctc_wer), I'm now directly passing all necessary "main" class variables to the interctc_setup method (except for self.cfg). I'm not sure if there is much sense in still keeping interctc as mixin, given that it does not access anything from the main class anymore, but not making any changes for now.

Additionally, there are 2 slight bugs in the original code. First, inter_ctc_loss was incorrectly calculated in the logs (main loss was used, not inter_ctc there). Note that this is purely a logging thing and does not affect training. Second, main loss was not multiplied by the correct coefficient, which does affect training to some extent. So, the calculation should have been something like loss = final_loss * 0.7 + inter_loss * 0.3, while it was just loss = final_loss + inter_loss * 0.3

Collection: ASR

Changelog

Add specific line by line info of high level changes in this PR.

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

tests/collections/asr/test_asr_ctcencdec_model.py

tests/collections/asr/test_asr_interctc_models.py

VahidooX

LGTM! Some minor comments.

nemo/collections/asr/models/hybrid_rnnt_ctc_models.py

tests/collections/asr/test_asr_interctc_models.py

VahidooX · 2023-03-16T02:41:21Z

tests/collections/asr/test_asr_interctc_models.py

+import torch
+from omegaconf import DictConfig, ListConfig
+
+from nemo.collections.asr.metrics.wer import CTCDecoding, CTCDecodingConfig


Unused imports here.

titu1994 · 2023-03-16T02:48:06Z

nemo/collections/asr/models/ctc_models.py

@@ -104,7 +104,7 @@ def __init__(self, cfg: DictConfig, trainer: Trainer = None):
        self.setup_optimization_flags()

        # setting up interCTC loss (from InterCTCMixin)
-        self.setup_interctc()
+        self.setup_interctc(self._wer, self.encoder, self.decoder, self.loss)


order of inputs should be pytorch modules first, then loss, then metrics. This is by conversion in RNNT.

titu1994 · 2023-03-16T02:48:21Z

nemo/collections/asr/models/hybrid_rnnt_ctc_models.py

@@ -88,6 +88,9 @@ def __init__(self, cfg: DictConfig, trainer: Trainer = None):
        # setting the RNNT decoder as the default one
        self.use_rnnt_decoder = True

+        # setting up interCTC loss (from InterCTCMixin)
+        self.setup_interctc(self.ctc_wer, self.encoder, self.ctc_decoder, self.ctc_loss)


Follow above convension

nemo/collections/asr/models/hybrid_rnnt_ctc_models.py

titu1994 · 2023-03-16T02:51:43Z

nemo/collections/asr/models/hybrid_rnnt_ctc_models.py

@@ -535,7 +538,6 @@ def validation_step(self, batch, batch_idx, dataloader_idx=0):
            tensorboard_logs['val_ctc_loss'] = ctc_loss
            tensorboard_logs['val_rnnt_loss'] = loss_value
            loss_value = (1 - self.ctc_loss_weight) * loss_value + self.ctc_loss_weight * ctc_loss
-            tensorboard_logs['val_loss'] = loss_value


Revert above removals - RNNT loss is calculated only optionally in inference mode, it is not set at all if the value is False (which is the default). Below, you can check and then add more to the value if interctc is enabled or ignore it if rnnt loss is not supposed to be logged.

titu1994 · 2023-03-16T02:52:40Z

nemo/collections/asr/models/hybrid_rnnt_ctc_models.py

@@ -582,7 +579,9 @@ def multi_validation_epoch_end(self, outputs, dataloader_idx: int = 0):
            ctc_wer_num = torch.stack([x['val_wer_num_ctc'] for x in outputs]).sum()
            ctc_wer_denom = torch.stack([x['val_wer_denom_ctc'] for x in outputs]).sum()
            tensorboard_logs['val_wer_ctc'] = ctc_wer_num.float() / ctc_wer_denom
-        return {**val_loss_log, 'log': tensorboard_logs}
+        metrics = {**val_loss_log, 'log': tensorboard_logs}
+        self.finalize_interctc_metrics(metrics, outputs, prefix="val_")


losses for rnnt are optional in val/test mode, so check and then add.

titu1994 · 2023-03-16T02:57:54Z

nemo/collections/asr/parts/mixins/interctc_mixin.py

        interctc_config = self.cfg.get("interctc")
        if interctc_config is not None:
            # if interctc is in the config, we want to check that it indeed defines
            # the required keys and nothing else - that's automatically done by
            # matching with keyword arguments in self._process_config_values
            self._process_config_values(**interctc_config)
+            self._interctc_params['wer'] = wer


Do you need to keep a reference to these objects.? It can leak memory. Use WeakReference instead. Note that other objects like Decoding do keep the value of these objects, but they are changeable + visible directly as part of the model code, whereas this mixin is not.
It is easy to forget that this mixin has references to modules and can be forgotten during change of model vocabulary (which updates decoding + metric)..

I would prefer to simply not register modules like this inside of this mixin.

titu1994 · 2023-03-16T02:59:11Z

nemo/collections/asr/parts/mixins/interctc_mixin.py

        """
        if not self.is_interctc_enabled():
            return []

        # note that we have a loop here, because tensors can be defined from
        # submodules of encoder (e.g., that's the case in Jasper)
        total_registry = {}
-        for module_registry in AccessMixin.get_module_registry(self.encoder).values():
-            for key, value in module_registry.items():
+        for module_registry in AccessMixin.get_module_registry(self._interctc_params['encoder']).values():


You can just do self.encoder here - don't need to keep the encoder in a dict and have a reference to it

titu1994 · 2023-03-16T02:59:42Z

nemo/collections/asr/parts/mixins/interctc_mixin.py

@@ -154,7 +169,9 @@ def get_captured_interctc_tensors(self) -> List[Tuple[torch.Tensor, torch.Tensor
                raise RuntimeError(
                    "Make sure encoder.forward is called exactly one time before interCTC loss is computed."
                )
-            captured_tensors.append((self.decoder(encoder_output=layer_outputs[0]), layer_lengths[0]))
+            captured_tensors.append(
+                (self._interctc_params['decoder'](encoder_output=layer_outputs[0]), layer_lengths[0])


You can just use self.decoder here

titu1994 · 2023-03-16T03:00:14Z

nemo/collections/asr/parts/mixins/interctc_mixin.py

        ):
-            inter_loss_value = self.loss(
+            inter_loss_value = self._interctc_params['loss'](


Why not use self.loss?

titu1994 · 2023-03-16T03:01:19Z

nemo/collections/asr/parts/mixins/interctc_mixin.py

            loss_value += inter_loss_value * loss_weight
            if compute_wer:
-                self._wer.update(
+                self._interctc_params['wer'].update(


Why not use self._wer ? You can check and set self._wer or self.wer (or better yet, add alias of self.wer = self._wer as a property for CTC models.

Signed-off-by: Igor Gitman <[email protected]>

VahidooX

LGTM!
Just one not used import of CTCDecoding in test_asr_interctc_models.py.

titu1994

Looks much better, thanks ! Minor change then let's merge

titu1994 · 2023-03-16T20:11:33Z

nemo/collections/asr/models/ctc_models.py

@@ -104,7 +104,7 @@ def __init__(self, cfg: DictConfig, trainer: Trainer = None):
        self.setup_optimization_flags()

        # setting up interCTC loss (from InterCTCMixin)
-        self.setup_interctc()
+        self.setup_interctc('decoder', 'loss', '_wer')


Can you use keyword args here ? Kinda hard to tell what is input to this function

titu1994 · 2023-03-16T20:11:46Z

nemo/collections/asr/models/hybrid_rnnt_ctc_models.py

@@ -88,6 +88,9 @@ def __init__(self, cfg: DictConfig, trainer: Trainer = None):
        # setting the RNNT decoder as the default one
        self.use_rnnt_decoder = True

+        # setting up interCTC loss (from InterCTCMixin)
+        self.setup_interctc('ctc_decoder', 'ctc_loss', 'ctc_wer')


Same as above

Signed-off-by: Igor Gitman <[email protected]>

titu1994

Thanks !

* Add interctc functionality to hybrid models Signed-off-by: Igor Gitman <[email protected]> * Fix bugs with interctc loss Signed-off-by: Igor Gitman <[email protected]> * Update configs Signed-off-by: Igor Gitman <[email protected]> * Minor cleanup + use attribute names instead of objects in setup Signed-off-by: Igor Gitman <[email protected]> * Correctly handle compute_eval_loss=False Signed-off-by: Igor Gitman <[email protected]> * Add compute_eval_loss=False test cases Signed-off-by: Igor Gitman <[email protected]> * Remove unused import, add keyword args Signed-off-by: Igor Gitman <[email protected]> --------- Signed-off-by: Igor Gitman <[email protected]>

* Add interctc functionality to hybrid models Signed-off-by: Igor Gitman <[email protected]> * Fix bugs with interctc loss Signed-off-by: Igor Gitman <[email protected]> * Update configs Signed-off-by: Igor Gitman <[email protected]> * Minor cleanup + use attribute names instead of objects in setup Signed-off-by: Igor Gitman <[email protected]> * Correctly handle compute_eval_loss=False Signed-off-by: Igor Gitman <[email protected]> * Add compute_eval_loss=False test cases Signed-off-by: Igor Gitman <[email protected]> * Remove unused import, add keyword args Signed-off-by: Igor Gitman <[email protected]> --------- Signed-off-by: Igor Gitman <[email protected]> Signed-off-by: hsiehjackson <[email protected]>

Kipok requested review from titu1994 and VahidooX March 15, 2023 23:27

github-actions bot added the ASR label Mar 15, 2023

github-advanced-security bot found potential problems Mar 16, 2023

View reviewed changes

tests/collections/asr/test_asr_ctcencdec_model.py Fixed Show fixed Hide fixed

tests/collections/asr/test_asr_interctc_models.py Fixed Show fixed Hide fixed

VahidooX requested changes Mar 16, 2023

View reviewed changes

titu1994 requested changes Mar 16, 2023

View reviewed changes

Kipok added 6 commits March 16, 2023 19:30

Add interctc functionality to hybrid models

1acce5b

Signed-off-by: Igor Gitman <[email protected]>

Fix bugs with interctc loss

421379a

Signed-off-by: Igor Gitman <[email protected]>

Update configs

1818364

Signed-off-by: Igor Gitman <[email protected]>

Minor cleanup + use attribute names instead of objects in setup

9231a03

Signed-off-by: Igor Gitman <[email protected]>

Correctly handle compute_eval_loss=False

6bb7f9d

Signed-off-by: Igor Gitman <[email protected]>

Add compute_eval_loss=False test cases

cd39d33

Signed-off-by: Igor Gitman <[email protected]>

Kipok force-pushed the hybrid-interctc branch from 56a7553 to cd39d33 Compare March 16, 2023 19:30

Kipok requested review from titu1994 and VahidooX March 16, 2023 19:31

VahidooX previously approved these changes Mar 16, 2023

View reviewed changes

titu1994 previously approved these changes Mar 16, 2023

View reviewed changes

Remove unused import, add keyword args

80e65c5

Signed-off-by: Igor Gitman <[email protected]>

Kipok dismissed stale reviews from titu1994 and VahidooX via 80e65c5 March 16, 2023 20:32

Kipok mentioned this pull request Mar 16, 2023

Fix bugs with interctc mixin #6228

Merged

8 tasks

titu1994 approved these changes Mar 16, 2023

View reviewed changes

titu1994 merged commit 4d096a1 into NVIDIA:main Mar 16, 2023

github-actions bot mentioned this pull request Mar 17, 2023

Fix bugs with interctc mixin #6237

Closed

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding interCTC loss to hybrid models #6215

Adding interCTC loss to hybrid models #6215

Kipok commented Mar 15, 2023

VahidooX left a comment

VahidooX Mar 16, 2023

titu1994 Mar 16, 2023

titu1994 Mar 16, 2023

titu1994 Mar 16, 2023

titu1994 Mar 16, 2023

titu1994 Mar 16, 2023

titu1994 Mar 16, 2023

titu1994 Mar 16, 2023

titu1994 Mar 16, 2023

titu1994 Mar 16, 2023

VahidooX left a comment

titu1994 left a comment

titu1994 Mar 16, 2023

titu1994 Mar 16, 2023

titu1994 left a comment

Adding interCTC loss to hybrid models #6215

Adding interCTC loss to hybrid models #6215

Conversation

Kipok commented Mar 15, 2023

What does this PR do ?

Changelog

Usage

Before your PR is "Ready for review"

Who can review?

Additional Information

VahidooX left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

VahidooX left a comment

Choose a reason for hiding this comment

titu1994 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

titu1994 left a comment

Choose a reason for hiding this comment