Hybrid Autoregressive Transducer (HAT) #6260

andrusenkoau · 2023-03-20T14:05:25Z

What does this PR do ?

Add HAT model as a new joint network type (HATJoint) for RNNT model. The difference is only in decoding time -- HAT.joint.joint returns two outputs: hat_logprobs and internal_lm_logprobs (for internal lm subtraction in case of Shallow Fusion with external n-gram LM).

Collection: [ASR]

Usage

For HAT model training you need replace _target_: nemo.collections.asr.modules.RNNTJoint with _target_: nemo.collections.asr.modules.HATJoint in joint part of standard transducer config.
For Shallow Fusion with external n-gram LM use RNNT maes decoding algorithm which is able to work with HATJoint model.

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

Signed-off-by: andrusenkoau <[email protected]>

for more information, see https://pre-commit.ci

titu1994

Needs a bit of refactoring

titu1994 · 2023-03-20T16:41:10Z

nemo/collections/asr/modules/hat.py

Filename should be full "hybrid_autoregressive_transducer.py"

titu1994 · 2023-03-20T16:45:32Z

nemo/collections/asr/modules/hat.py

+from nemo.utils import logging
+
+
+class HATJoint(rnnt_abstract.AbstractRNNTJoint, Exportable, AdapterModuleMixin):


This class is duplicating a lot of code from RNNTJoint. Would it make sense to subclass it ?

Great comment. I took the RNNTJoint as a parent class and left only several modifications for new HATJoint class. Check it pls.

titu1994 · 2023-03-20T16:49:29Z

nemo/collections/asr/parts/submodules/rnnt_beam_decoding.py

@@ -460,7 +466,12 @@ def greedy_search(

            # TODO: Figure out how to remove this hard coding afterwords
            while not_blank and (symbols_added < 5):
-                ytu = torch.log_softmax(self.joint.joint(hi, y) / self.softmax_temperature, dim=-1)  # [1, 1, 1, V + 1]
+                if isinstance(self.joint, HATJoint):
+                    ytu, _ = self.joint.joint(hi, y)


This kinda logic is problematic in the long run. Why not take a bool in the HAT module that determine what self.joint returns - by default it's set and returns both items, otherwise return things in the form of RNNT so that this code doesn't need to change

I thought that jit compiler does not like variable outputs number. Now I made default mode -- return only logprobs (like the standard rnnt joint) and return both logprobs and internal_lm_logprobs (in case of special boolean flag). This is allowed to save more rnnt decoding code unchanged.

titu1994 · 2023-03-20T16:53:39Z

nemo/collections/asr/parts/submodules/rnnt_greedy_decoding.py

@@ -34,6 +34,7 @@
 from omegaconf import DictConfig

 from nemo.collections.asr.modules import rnnt_abstract
+from nemo.collections.asr.modules.hat import HATJoint


No modules should be imported inside of Greedy of Beam decoding libraries because it will eventually cause circular dependency

This line is no longer needed due to the new default hat.joint.joint logic (the same as rnnt).

nemo/collections/asr/modules/hat.py

tests/collections/asr/test_asr_modules.py

nemo/collections/asr/modules/hat.py

Signed-off-by: andrusenkoau <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: andrusenkoau <[email protected]>

github-advanced-security

CodeQL found more than 10 potential problems in the proposed changes. Check the Files changed tab for more details.

titu1994

Currently the code is too circular for HAT import. Another thing is it requires too many modifications to an already very complicated function (mAES).

The first thing we can make more generic with dataclass and property trucks. Those changes are relatively simple but require some refactor.

The second one I dunno how to make more generic. Perhaps an abstract method inside of AbstractRNNTJoint that discussed how to do special forward of joint ? That's a heavy refactor so ignore it for now.

titu1994 · 2023-03-22T05:49:56Z

nemo/collections/asr/parts/submodules/rnnt_beam_decoding.py

@@ -34,6 +34,7 @@
 import torch
 from tqdm import tqdm

+from nemo.collections.asr.modules import hybrid_autoregressive_transducer as hat


Hmm so this import doesn't actually fix circular import - think of it like this

RNNTModel needs EncDecJoint, Loss, Decoding, Metric
Decoding depends on Decoder + Joint
Metric depends on Decoding.
Joint depends on loss and metric.

But now decoding itself imports the joint module. That's fine for now but can be more circular and crash in the future. I'll discuss an alternative below

titu1994 · 2023-03-22T05:55:55Z

nemo/collections/asr/modules/hybrid_autoregressive_transducer.py

+
+        res = torch.cat((label_logprob_scaled, blank_logprob), dim=-1).contiguous()  # [B, T, U, V+1]
+
+        if return_ilm:


In this case, it seems incorrect to return a tuple here. Let's do this instead -
In rnnt_utils.py create a dataclass call HATJointOutput. It has just two value - a tensor for logprobs and a tensor for ilm. Both are none by default.

If return_ilm property of this class is set, you will build an object of this dataclass, put the two values and return that

More details below

nemo/collections/asr/modules/hybrid_autoregressive_transducer.py

titu1994 · 2023-03-22T06:06:05Z

nemo/collections/asr/modules/hybrid_autoregressive_transducer.py

+
+    def joint(
+        self, f: torch.Tensor, g: torch.Tensor, return_ilm: bool = False
+    ) -> Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]:


Remove return_ilm from here, use the properties

nemo/collections/asr/parts/submodules/rnnt_beam_decoding.py

titu1994 · 2023-03-22T06:14:38Z

nemo/collections/asr/parts/submodules/rnnt_beam_decoding.py

-                beam_logp, beam_idx = torch.log_softmax(
-                    self.joint.joint(beam_enc_out, beam_dec_out) / self.softmax_temperature, dim=-1,
-                ).topk(self.max_candidates, dim=-1)
+                if isinstance(self.joint, hat.HATJoint) and self.hat_subtract_ilm:


Here, everywhere, simply call the self.joint.joint(with the ordinary arguments for RNNT). The output can now be either torch.Tensor - (RNNT joint, HAT without the ILM subtract) or it can be HATOutput dataclass.

import RNNT utils and then check if torch.is_tensor(output) here - this is for og RNNT. Elif self.hat_subtract_ilm and isinstance(output, HATOutput):

Then do the required code path. On else path, raise error saying could not resolve the output

titu1994 · 2023-03-22T06:14:50Z

nemo/collections/asr/parts/submodules/rnnt_beam_decoding.py

@@ -1196,7 +1206,12 @@ def modified_adaptive_expansion_search(
                                    lm_score, new_hyp.ngram_lm_state = self.compute_ngram_score(
                                        hyp.ngram_lm_state, int(k)
                                    )
-                                    new_hyp.score += self.ngram_lm_alpha * lm_score
+                                    if isinstance(self.joint, hat.HATJoint) and self.hat_subtract_ilm:


Same for everywhere else below.

Hi @titu1994! Thank you for detailed review. I tried to modify HAT related code according to your suggestions. For convenience I also added resolve_joint_output function. Check it pls.

Signed-off-by: andrusenkoau <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: andrusenkoau <[email protected]>

nemo/collections/asr/modules/hybrid_autoregressive_transducer.py

+from nemo.collections.asr.modules import rnnt
+from nemo.collections.asr.parts.utils.rnnt_utils import HATJointOutput
+
+from nemo.utils import logging


nemo/collections/asr/modules/hybrid_autoregressive_transducer.py

+        self.pred, self.enc, self.joint_net, self.blank_pred = self._joint_hat_net_modules(
+            num_classes=self._vocab_size,  # non blank symbol
+            pred_n_hidden=self.pred_hidden,
+            enc_n_hidden=self.encoder_hidden,
+            joint_n_hidden=self.joint_hidden,
+            activation=self.activation,
+            dropout=jointnet.get('dropout', 0.0),
+        )


nemo/collections/asr/modules/hybrid_autoregressive_transducer.py

+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from typing import Any, Dict, List, Optional, Tuple, Union


titu1994

It looks very good now, could you add some tests that assert that normally forward returns tensor and hat forward with and without flag set returns either tensor or HATJointOutput.

Another thing is we support only mAES and normal beam, can you look into the complexity of the other beam algos to support hat ? If it's difficult we can leave it to another pr in the future

titu1994 · 2023-03-23T18:18:32Z

nemo/collections/asr/parts/submodules/rnnt_beam_decoding.py

I see we support only basic beam and maes. Can you look into supporting HAT with other algos ? If it's simple, it can be done in this pr, if not in another pr.

Do you mean n-gram LM fusion (with RNNT and HAT) for other decoding algorithms? Now only maes algorithm supports LM fusion. I did not do it for default beam search because it works too slow. I do not think anyone wants to use it because of speed.

BTW, all the decoding algorithms can work now with HAT model without LM fusion because HATJoint has the same default output type like RNNTJoint.

Oh ok sounds good then

kobenaxie · 2023-03-24T06:31:24Z

Hi @andrusenkoau , the HATJoint returns log softmaxed log_prob, but the rnnt_loss in torchaudio or rnnt_pytorch receives logits without logsoftmax, should this be unified ?

titu1994 · 2023-03-24T07:05:35Z

@kobenaxie that's a template implementation of the loss using pure PyTorch, it is not used during actual training since it is super slow. Instead we use numba bases cuda compiled loss.

Also, hat during training does not return the dataclass (which the loss anyway would not accept) so it is fine

titu1994 · 2023-03-24T07:55:46Z

Looks great !

titu1994 · 2023-03-24T07:57:08Z

Final things to do are to add HAT decoder based conformer config to a conf dir called conf/hat_transducer/conformer/conformer_hat_bpe.yaml / char.yaml

titu1994 · 2023-03-24T07:57:23Z

That can be done when release bench is cut.

andrusenkoau · 2023-03-24T09:52:49Z

Hi @andrusenkoau , the HATJoint returns log softmaxed log_prob, but the rnnt_loss in torchaudio or rnnt_pytorch receives logits without logsoftmax, should this be unified ?

Hi @kobenaxie, HAT logic demands to work in the probability domain in order to calculate blank probability and then scale labels probability. For the implementation simplicity we can use the rule -- logsoftmax(logsoftmax(x)) = logsoftmax(x) => it is possible to use HAT log_probs output with any rnnt loss functions which have logsoftmax calculation inside to get final model loss.

andrusenkoau · 2023-03-24T09:55:17Z

Looks great !

@titu1994 thank you so much for great review and help with code modification!

* add hat joint network Signed-off-by: andrusenkoau <[email protected]> * add HATJoint module Signed-off-by: andrusenkoau <[email protected]> * add hat script Signed-off-by: andrusenkoau <[email protected]> * add hat decoding option Signed-off-by: andrusenkoau <[email protected]> * add hat related parameters to maes decoding Signed-off-by: andrusenkoau <[email protected]> * minor fixes Signed-off-by: andrusenkoau <[email protected]> * add hat decoding option Signed-off-by: andrusenkoau <[email protected]> * minor fixes Signed-off-by: andrusenkoau <[email protected]> * add hat related parameters Signed-off-by: andrusenkoau <[email protected]> * minor fixes Signed-off-by: andrusenkoau <[email protected]> * add hat to all rnnt decoding types Signed-off-by: andrusenkoau <[email protected]> * add test for hatjoint Signed-off-by: andrusenkoau <[email protected]> * combine hatjoint with all rnntjoint tests Signed-off-by: andrusenkoau <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fixes Signed-off-by: andrusenkoau <[email protected]> * rename hat file Signed-off-by: andrusenkoau <[email protected]> * fix hat double output Signed-off-by: andrusenkoau <[email protected]> * fix hat double output Signed-off-by: andrusenkoau <[email protected]> * fix hat double output Signed-off-by: andrusenkoau <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * nemo/collections/asr/parts/submodules/rnnt_greedy_decoding.py Signed-off-by: andrusenkoau <[email protected]> * minor fixes Signed-off-by: andrusenkoau <[email protected]> * add return_hat_ilm property Signed-off-by: andrusenkoau <[email protected]> * add HATJointOutput dataclass Signed-off-by: andrusenkoau <[email protected]> * add resolve_joint_output function Signed-off-by: andrusenkoau <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add local return_hat_ilm_default variable Signed-off-by: andrusenkoau <[email protected]> * minor fixes Signed-off-by: andrusenkoau <[email protected]> --------- Signed-off-by: andrusenkoau <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: hsiehjackson <[email protected]>

andrusenkoau and others added 15 commits March 13, 2023 22:20

add hat joint network

7e0134f

Signed-off-by: andrusenkoau <[email protected]>

add HATJoint module

d0660d1

Signed-off-by: andrusenkoau <[email protected]>

add hat script

7445ae9

Signed-off-by: andrusenkoau <[email protected]>

add hat decoding option

7bb0c5e

Signed-off-by: andrusenkoau <[email protected]>

Merge branch 'main' of https://github.com/andrusenkoau/NeMo into hat_rc

486d079

add hat related parameters to maes decoding

8e3cf17

Signed-off-by: andrusenkoau <[email protected]>

minor fixes

bb93e4a

Signed-off-by: andrusenkoau <[email protected]>

add hat decoding option

2f30f2b

Signed-off-by: andrusenkoau <[email protected]>

minor fixes

c4f6bc6

Signed-off-by: andrusenkoau <[email protected]>

add hat related parameters

0dfc992

Signed-off-by: andrusenkoau <[email protected]>

minor fixes

c6e03df

Signed-off-by: andrusenkoau <[email protected]>

add hat to all rnnt decoding types

f77a68a

Signed-off-by: andrusenkoau <[email protected]>

add test for hatjoint

6c3053f

Signed-off-by: andrusenkoau <[email protected]>

combine hatjoint with all rnntjoint tests

a429f1f

Signed-off-by: andrusenkoau <[email protected]>

Merge branch 'NVIDIA:main' into hat_rc

9627083

github-actions bot added the ASR label Mar 20, 2023

[pre-commit.ci] auto fixes from pre-commit.com hooks

f60fab5

for more information, see https://pre-commit.ci

titu1994 requested changes Mar 20, 2023

View reviewed changes

github-advanced-security bot found potential problems Mar 20, 2023

View reviewed changes

andrusenkoau and others added 8 commits March 21, 2023 10:35

minor fixes

e5abab1

Signed-off-by: andrusenkoau <[email protected]>

rename hat file

b6bd00c

Signed-off-by: andrusenkoau <[email protected]>

fix hat double output

1c89e38

Signed-off-by: andrusenkoau <[email protected]>

fix hat double output

a52b643

Signed-off-by: andrusenkoau <[email protected]>

fix hat double output

c528c89

Signed-off-by: andrusenkoau <[email protected]>

Merge branch 'NVIDIA:main' into hat_rc

1e1dd84

[pre-commit.ci] auto fixes from pre-commit.com hooks

e670974

for more information, see https://pre-commit.ci

nemo/collections/asr/parts/submodules/rnnt_greedy_decoding.py

7bef640

Signed-off-by: andrusenkoau <[email protected]>

github-advanced-security bot found potential problems Mar 21, 2023

View reviewed changes

Merge branch 'main' into hat_rc

3843c79

andrusenkoau marked this pull request as ready for review March 22, 2023 05:22

titu1994 requested changes Mar 22, 2023

View reviewed changes

andrusenkoau and others added 8 commits March 23, 2023 05:53

minor fixes

290a7e0

Signed-off-by: andrusenkoau <[email protected]>

add return_hat_ilm property

9996afa

Signed-off-by: andrusenkoau <[email protected]>

add HATJointOutput dataclass

8673e5b

Signed-off-by: andrusenkoau <[email protected]>

add resolve_joint_output function

f049e82

Signed-off-by: andrusenkoau <[email protected]>

Merge branch 'main' into hat_rc

3d92ca7

[pre-commit.ci] auto fixes from pre-commit.com hooks

706d2a6

for more information, see https://pre-commit.ci

add local return_hat_ilm_default variable

9cf3648

Signed-off-by: andrusenkoau <[email protected]>

minor fixes

bbcb4e2

Signed-off-by: andrusenkoau <[email protected]>

github-advanced-security bot found potential problems Mar 23, 2023

View reviewed changes

titu1994 reviewed Mar 23, 2023

View reviewed changes

titu1994 approved these changes Mar 24, 2023

View reviewed changes

titu1994 merged commit 2e36872 into NVIDIA:main Mar 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hybrid Autoregressive Transducer (HAT) #6260

Hybrid Autoregressive Transducer (HAT) #6260

andrusenkoau commented Mar 20, 2023

titu1994 left a comment

titu1994 Mar 20, 2023

andrusenkoau Mar 21, 2023

titu1994 Mar 20, 2023

andrusenkoau Mar 21, 2023

titu1994 Mar 20, 2023

andrusenkoau Mar 21, 2023 •

edited

Loading

titu1994 Mar 20, 2023

andrusenkoau Mar 21, 2023 •

edited

Loading

github-advanced-security bot left a comment

titu1994 left a comment

titu1994 Mar 22, 2023

titu1994 Mar 22, 2023 •

edited

Loading

titu1994 Mar 22, 2023

titu1994 Mar 22, 2023

titu1994 Mar 22, 2023

andrusenkoau Mar 23, 2023

titu1994 left a comment

titu1994 Mar 23, 2023

andrusenkoau Mar 24, 2023 •

edited

Loading

titu1994 Mar 24, 2023

kobenaxie commented Mar 24, 2023

titu1994 commented Mar 24, 2023

titu1994 commented Mar 24, 2023

titu1994 commented Mar 24, 2023

titu1994 commented Mar 24, 2023

andrusenkoau commented Mar 24, 2023

andrusenkoau commented Mar 24, 2023

		from nemo.utils import logging


		class HATJoint(rnnt_abstract.AbstractRNNTJoint, Exportable, AdapterModuleMixin):


		res = torch.cat((label_logprob_scaled, blank_logprob), dim=-1).contiguous() # [B, T, U, V+1]

		if return_ilm:

Hybrid Autoregressive Transducer (HAT) #6260

Hybrid Autoregressive Transducer (HAT) #6260

Conversation

andrusenkoau commented Mar 20, 2023

What does this PR do ?

Usage

Before your PR is "Ready for review"

Who can review?

Additional Information

titu1994 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrusenkoau Mar 21, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrusenkoau Mar 21, 2023 • edited Loading

Choose a reason for hiding this comment

github-advanced-security bot left a comment

Choose a reason for hiding this comment

titu1994 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

titu1994 Mar 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

titu1994 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrusenkoau Mar 24, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kobenaxie commented Mar 24, 2023

titu1994 commented Mar 24, 2023

titu1994 commented Mar 24, 2023

titu1994 commented Mar 24, 2023

titu1994 commented Mar 24, 2023

andrusenkoau commented Mar 24, 2023

andrusenkoau commented Mar 24, 2023

andrusenkoau Mar 21, 2023 •

edited

Loading

andrusenkoau Mar 21, 2023 •

edited

Loading

titu1994 Mar 22, 2023 •

edited

Loading

andrusenkoau Mar 24, 2023 •

edited

Loading