Speedup RNN-T greedy decoding #7926

artbataev · 2023-11-21T19:18:55Z

What does this PR do ?

New algorithm for greedy batched decoding for RNN-Transducer.
With large batch sizes (e.g., 128) the expected speedup for large Fast Conformer-Transducer (full evaluation time including Encoder) is 1.7x-1.9x (when using speech_to_text_eval.py). For small batch sizes, e.g., 16, the observed speedup is ~1.3x.
The original algorithm is preserved and can be enabled by using loop_labels=False

E.g., on my local machine, with bf16, bs=128, Fast Conformer-Transducer Large, full test-other decoding

Algorithm	Greedy	Greedy + Alignments
Current NeMo	45 sec	1 min 38 sec
Proposed	24 sec	30 sec

Collection: [ASR]

Changelog

Add specific line by line info of high level changes in this PR.

Usage

# default - new decoding algorithm
python examples/asr/speech_to_text_eval.py \
   model_path=<nemo_model.nemo> \
   dataset_manifest=<manifest> \
   batch_size=128 \
   output_filename=<output_mainfest_path> 

# previous algorithm is preserved and can be used with `loop_labels=false`
python examples/asr/speech_to_text_eval.py \
   model_path=<nemo_model.nemo> \
   dataset_manifest=<manifest> \
   batch_size=128 \
   output_filename=<output_mainfest_path> \
   rnnt_decoding.greedy.loop_labels=false

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

Signed-off-by: Vladimir Bataev <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: Vladimir Bataev <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: Vladimir Bataev <[email protected]>

nemo/collections/asr/metrics/rnnt_wer.py

Signed-off-by: Vladimir Bataev <[email protected]>

github-actions · 2023-12-09T01:45:09Z

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

Signed-off-by: Vladimir Bataev <[email protected]>

artbataev · 2023-12-14T16:09:10Z

jenkins

github-actions · 2023-12-29T01:43:17Z

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

artbataev · 2024-01-12T15:06:27Z

jenkins

GNroy

LGTM, but see comments.
I'd like to especially commend your tests, thanks for improving NeMo!

nemo/collections/asr/modules/rnnt.py

nemo/collections/asr/modules/rnnt_abstract.py

nemo/collections/asr/parts/submodules/rnnt_greedy_decoding.py

GNroy · 2024-01-12T22:37:04Z

tests/collections/asr/decoding/test_rnnt_decoding.py

+
+            # Use the following commented print statements to check
+            # the alignment of other algorithms compared to the default
+            print("Text", hyp.text)


Use the following commented print statements

not commented

This was copied from the code nearby.
I reworked the test: instead of just printing the alignment, I use non-batched greedy decoding as a reference, and check if the batched version returns the same results.

Signed-off-by: Vladimir Bataev <[email protected]>

artbataev · 2024-01-15T13:54:16Z

jenkins

GNroy

LGTM, thanks!

titu1994

Excellent work. Minor comments on inline documentation of the actual decoding loop and explain what is loop labels.

I also want to ask why the separation of joint into 3 functions - it seems ok but for example allows HAT to use less memory efficient path which can cause oom.

Finally, excellent tests, much better coverage of cases than before

titu1994 · 2024-01-15T18:18:24Z

nemo/collections/asr/modules/hybrid_autoregressive_transducer.py

@@ -138,9 +138,9 @@ def return_hat_ilm(self):
    def return_hat_ilm(self, hat_subtract_ilm):
        self._return_hat_ilm = hat_subtract_ilm

-    def joint(self, f: torch.Tensor, g: torch.Tensor) -> Union[torch.Tensor, HATJointOutput]:
+    def joint_after_projection(self, f: torch.Tensor, g: torch.Tensor) -> Union[torch.Tensor, HATJointOutput]:


It would be better to have the similar API name across the RNNT Joints, is it necessary to change this ?

The API is changed for all Joints, starting from AbstractRNNTJoint (see details in Slack)

Now it is the following:

class AbstractRNNTJoint(NeuralModule, ABC): @abstractmethod def project_encoder(self, encoder_output): raise NotImplementedError() # can be Linear or identity @abstractmethod def project_prednet(self, encoder_output): raise NotImplementedError() # can be Linear or identity @abstractmethod def joint_after_projection(self, f, g): """This is the main method that one should implement for Joint""" raise NotImplementedError() def joint(self, f, g): """Full joint computation. Not abstract anymore!""" return self.joint_after_projection(self.project_encoder(f), self.project_prednet(g))

titu1994 · 2024-01-15T18:19:40Z

nemo/collections/asr/modules/hybrid_autoregressive_transducer.py

-        g = self.pred(g)
-        g.unsqueeze_(dim=1)  # (B, 1, U, H)
-
+        f = f.unsqueeze(dim=2)  # (B, T, 1, H)


Why remove the preemptive enc() pred() ? This is shown to be equivalent to RNNT and saves a ton of memory

Inplace unsqueeze_ does not save memory.

Due to separating projections I needed to replace in-place unsqueeze_ operation with unsqueeze. There is no overhead in memory.
According to the documentation https://pytorch.org/docs/stable/generated/torch.unsqueeze.html

The returned tensor shares the same underlying data with this tensor.

You can check it manually:

import torch device = torch.device('cuda:0') def print_allocated(device, prefix=""): allocated_mb = torch.cuda.max_memory_allocated() / 1024 / 1024 print(f"{prefix}{allocated_mb:.0f}MB")  print_allocated(device, prefix="Before: ") # Should be 0MB # allocate memory ~projection result data = torch.rand([128, 30 * 1000 // 10 // 8, 640], device=device) print_allocated(device, prefix="After project encoder output: ") # 118MB # apply unsqueeze data2 = data.unsqueeze(-1) # unsqueeze returns a new tensor, but storage is the same (only metadata is new!) print_allocated(device, prefix="After Unsqueeze: ") # same, 118MB

titu1994 · 2024-01-15T18:21:13Z

nemo/collections/asr/modules/rnnt.py

+        """
+        return self.pred(prednet_output)
+
+    def joint_after_projection(self, f: torch.Tensor, g: torch.Tensor) -> torch.Tensor:


Revert name change

It is essential to separate projections from other joint computations. It introduces no memory/computational overhead. See details in slack

titu1994 · 2024-01-15T18:22:19Z

nemo/collections/asr/modules/rnnt_abstract.py

@@ -28,6 +28,45 @@ class AbstractRNNTJoint(NeuralModule, ABC):
    """

    @abstractmethod
+    def joint_after_projection(self, f: torch.Tensor, g: torch.Tensor) -> Any:


Revert name change. It's fine to keep joint

See the comments above

titu1994 · 2024-01-15T18:24:47Z

nemo/collections/asr/parts/submodules/rnnt_greedy_decoding.py

@@ -545,6 +573,7 @@ def __init__(
        preserve_alignments: bool = False,
        preserve_frame_confidence: bool = False,
        confidence_method_cfg: Optional[DictConfig] = None,
+        loop_labels: bool = True,


Explain in docstring what this isc

Done, missed the class docstring before)

titu1994 · 2024-01-15T18:31:11Z

nemo/collections/asr/parts/submodules/rnnt_greedy_decoding.py

+                    if self.preserve_frame_confidence
+                    else None,
+                )
+            advance_mask = torch.logical_and(blank_mask, (time_indices[active_indices] + 1 < out_len[active_indices]))


Document line

Added a comment

titu1994 · 2024-01-15T18:31:29Z

nemo/collections/asr/parts/submodules/rnnt_greedy_decoding.py

+                    .squeeze(1)
+                    .squeeze(1)
+                )
+                more_scores, more_labels = logits.max(-1)


Added a comment (above this line)

titu1994 · 2024-01-15T18:32:24Z

nemo/collections/asr/parts/submodules/rnnt_greedy_decoding.py

+
+            # stage 4: to avoid looping, go to next frame after max_symbols emission
+            if self.max_symbols is not None:
+                force_blank_mask = torch.logical_and(


Added a comment above

titu1994 · 2024-01-15T18:34:47Z

nemo/collections/asr/parts/utils/rnnt_utils.py

+
+
+class BatchedHyps:
+    """Class to store batched hypotheses (labels, time_indices, scores) for efficient RNNT decoding"""


Very neat, this is done so that jit compile is happy?

Yep) There is also a test that torch.jit is fine with this structure :)

titu1994 · 2024-01-15T18:36:23Z

nemo/collections/asr/parts/utils/rnnt_utils.py

+    return hypotheses
+
+
+def return_empty_hypotheses(


Empty hys might be needed for beam search init and temp placeholders now that I remember

I removed this function, this was used only when max_symbols=0 for the new decoding algorithm

Signed-off-by: Vladimir Bataev <[email protected]>

artbataev · 2024-01-16T16:09:22Z

jenkins

Signed-off-by: Vladimir Bataev <[email protected]>

artbataev · 2024-01-16T17:09:21Z

jenkins

titu1994

After detailed explanation, the changes make sense design wize. If you could put those explanation in the PR itself it will help for future discussion. Thanks again for the significant speedup !

artbataev · 2024-01-16T20:30:05Z

Pasting here the discussion from Slack about Joint refactoring, joint_after_projection, non-inplace unsqueeze.

Main points:

(1) separating projections in Joint from other operations is not only needed for the "memory vs speed" tradeoff. It is also helpful for speed optimization without additional memory usage (this is done not only for (2))
(2) The immediate projection of encoder output is a tiny overhead, and I'm sure it is negligible compared to other operations for the RNN-T system
(3) inplace unsqueeze_ does not save memory, memory consumption is the same with unsqueeze, and there is no overhead for the change
(4) implementation of separation of projections – I tried to preserve compatibility, readability, and usability for inheritance.

1) separating projections in Joint from other operations is helpful in many cases.

Even in the original encoder algorithm, when we loop over encoder frames, we can project the frame immediately (one-by-one => no memory overhead), but this will save computations: for each encoder frame, multiple evaluations for Joint are used => we waste time when recalculating the encoder vector's projection.
The new algorithm is even more sensitive to operations in Joint, and I see a substantial speedup for separating projections

2) The immediate projection of encoder output is a tiny overhead

I see the speedup from projecting the encoder output immediately.
So, what's the overhead, and is it significant?

This could be considered a significant overhead when we used tiny encoders with linear memory/time complexity. For modern encoders with quadratic complexity (due to attention)
for bs 128, 30 sec, subsampling 8, joint_hidden=640, fp32, the size of tensor will be ~118MB, for bf16 – ~59MB
- To compare with the memory consumption of one piece of the encoder, I tried a MultiHeadAttention block (used by Conformer). It uses ~2129MB memory (one block!) due to quadratic complexity. I'm sure that 118MB or even 59MB is a tiny piece of memory compared to modern encoders (it's from Conformer large, not x+large!)
From a practical point of view, I can easily fit bs 256, fp32 to my desktop GPU (LibriSpeech test-other, Fast Conformer Large), and we are targeting bf16, bs 128, and smaller.
Comparison with CTC system: We project to the final output with vocabulary size dimension (not one-by-one!), which is larger than RNN-T projection, and do not optimize this for better memory usage, sacrificing the speed

Given all these facts, it is acceptable to project the encoder output immediately. If we need a robust memory consumption optimization, we can use a separate flag (preserve_memory), but I don't think it is now required.

3) in-place `unsqueeze_` does not save memory (no overhead after separating projections)

Due to separating projections, I needed to replace the in-place unsqueeze_ operation with unsqueeze. There is no overhead in memory.
According to the documentation https://pytorch.org/docs/stable/generated/torch.unsqueeze.html

The returned tensor shares the same underlying data with this tensor.

4) Implementation of separation of projections.

I think the acceptable solution should:

expose projections as the public API (AbstractRNNTJoint)
should be developer-friendly
should not lead to unnecessary code duplication
should not break checkpoints
should not introduce any significant overhead

I considered several possibilities.
a) we can duplicate the code for joint in joint_after_projection, but there I do not think it is a good practice to maintain the same code in 2 places (it must be the same except applying projections)

b) using enc() and pred() as functions: undesirable, since it will break the checkpoints.

c) use enc and pred with type annotations in public API

class AbstractRNNTJoint(NeuralModule, ABC):
   enc: Callable # can be Identity
   pred: Callable # can be Identity
   
  @abstractmethod
  def joint_after_projection(self, f, g):
     """This is the main method that one should implement for Joint""" 
    raise NotImplementedError()
   
  def joint(self, f, g):
     # not abstract anymore!
    return self.joint_after_projection(self.enc(f), self.pred(g))

d) separate abstract project_prednet and project_encoder methods – the current solution
I think it is better since if project_prednet and project_encoder are not implemented, a clear error will indicate this.

I prefer the last one because there is no overhead for the current implementation (see (3)).

* Add structure for batched hypotheses Signed-off-by: Vladimir Bataev <[email protected]> * Add faster decoding algo Signed-off-by: Vladimir Bataev <[email protected]> * Simplify max_symbols support. More speedup Signed-off-by: Vladimir Bataev <[email protected]> * Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Filtering only when necessary Signed-off-by: Vladimir Bataev <[email protected]> * Move max_symbols check to the end of loop Signed-off-by: Vladimir Bataev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Support returning prediction network states Signed-off-by: Vladimir Bataev <[email protected]> * Support preserve_alignments flag Signed-off-by: Vladimir Bataev <[email protected]> * Support confidence Signed-off-by: Vladimir Bataev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Partial fix for jit compatibility Signed-off-by: Vladimir Bataev <[email protected]> * Support switching between decoding algorithms Signed-off-by: Vladimir Bataev <[email protected]> * Fix switching algorithms Signed-off-by: Vladimir Bataev <[email protected]> * Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Fix max symbols per step Signed-off-by: Vladimir Bataev <[email protected]> * Add tests. Preserve torch.jit compatibility for BatchedHyps Signed-off-by: Vladimir Bataev <[email protected]> * Separate projection from Joint calculation in decoding Signed-off-by: Vladimir Bataev <[email protected]> * Fix config instantiation Signed-off-by: Vladimir Bataev <[email protected]> * Fix after main merge Signed-off-by: Vladimir Bataev <[email protected]> * Add tests for batched hypotheses Signed-off-by: Vladimir Bataev <[email protected]> * Speedup alignments Signed-off-by: Vladimir Bataev <[email protected]> * Test alignments Signed-off-by: Vladimir Bataev <[email protected]> * Fix alignments Signed-off-by: Vladimir Bataev <[email protected]> * Fix tests for alignments Signed-off-by: Vladimir Bataev <[email protected]> * Add more tests Signed-off-by: Vladimir Bataev <[email protected]> * Fix confidence tests Signed-off-by: Vladimir Bataev <[email protected]> * Avoid common package modification Signed-off-by: Vladimir Bataev <[email protected]> * Support Stateless prediction network Signed-off-by: Vladimir Bataev <[email protected]> * Improve stateless decoder support. Separate alignments and confidence Signed-off-by: Vladimir Bataev <[email protected]> * Fix alignments for max_symbols_per_step Signed-off-by: Vladimir Bataev <[email protected]> * Fix alignments for max_symbols_per_step=0 Signed-off-by: Vladimir Bataev <[email protected]> * Fix tests Signed-off-by: Vladimir Bataev <[email protected]> * Fix test Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Batched Hyps/Alignments: lengths -> current_lengths Signed-off-by: Vladimir Bataev <[email protected]> * Simplify indexing Signed-off-by: Vladimir Bataev <[email protected]> * Improve type annotations Signed-off-by: Vladimir Bataev <[email protected]> * Rework test for greedy decoding Signed-off-by: Vladimir Bataev <[email protected]> * Document loop_labels Signed-off-by: Vladimir Bataev <[email protected]> * Raise ValueError if max_symbols_per_step <= 0 Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Fix test Signed-off-by: Vladimir Bataev <[email protected]> --------- Signed-off-by: Vladimir Bataev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Add structure for batched hypotheses Signed-off-by: Vladimir Bataev <[email protected]> * Add faster decoding algo Signed-off-by: Vladimir Bataev <[email protected]> * Simplify max_symbols support. More speedup Signed-off-by: Vladimir Bataev <[email protected]> * Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Filtering only when necessary Signed-off-by: Vladimir Bataev <[email protected]> * Move max_symbols check to the end of loop Signed-off-by: Vladimir Bataev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Support returning prediction network states Signed-off-by: Vladimir Bataev <[email protected]> * Support preserve_alignments flag Signed-off-by: Vladimir Bataev <[email protected]> * Support confidence Signed-off-by: Vladimir Bataev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Partial fix for jit compatibility Signed-off-by: Vladimir Bataev <[email protected]> * Support switching between decoding algorithms Signed-off-by: Vladimir Bataev <[email protected]> * Fix switching algorithms Signed-off-by: Vladimir Bataev <[email protected]> * Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Fix max symbols per step Signed-off-by: Vladimir Bataev <[email protected]> * Add tests. Preserve torch.jit compatibility for BatchedHyps Signed-off-by: Vladimir Bataev <[email protected]> * Separate projection from Joint calculation in decoding Signed-off-by: Vladimir Bataev <[email protected]> * Fix config instantiation Signed-off-by: Vladimir Bataev <[email protected]> * Fix after main merge Signed-off-by: Vladimir Bataev <[email protected]> * Add tests for batched hypotheses Signed-off-by: Vladimir Bataev <[email protected]> * Speedup alignments Signed-off-by: Vladimir Bataev <[email protected]> * Test alignments Signed-off-by: Vladimir Bataev <[email protected]> * Fix alignments Signed-off-by: Vladimir Bataev <[email protected]> * Fix tests for alignments Signed-off-by: Vladimir Bataev <[email protected]> * Add more tests Signed-off-by: Vladimir Bataev <[email protected]> * Fix confidence tests Signed-off-by: Vladimir Bataev <[email protected]> * Avoid common package modification Signed-off-by: Vladimir Bataev <[email protected]> * Support Stateless prediction network Signed-off-by: Vladimir Bataev <[email protected]> * Improve stateless decoder support. Separate alignments and confidence Signed-off-by: Vladimir Bataev <[email protected]> * Fix alignments for max_symbols_per_step Signed-off-by: Vladimir Bataev <[email protected]> * Fix alignments for max_symbols_per_step=0 Signed-off-by: Vladimir Bataev <[email protected]> * Fix tests Signed-off-by: Vladimir Bataev <[email protected]> * Fix test Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Batched Hyps/Alignments: lengths -> current_lengths Signed-off-by: Vladimir Bataev <[email protected]> * Simplify indexing Signed-off-by: Vladimir Bataev <[email protected]> * Improve type annotations Signed-off-by: Vladimir Bataev <[email protected]> * Rework test for greedy decoding Signed-off-by: Vladimir Bataev <[email protected]> * Document loop_labels Signed-off-by: Vladimir Bataev <[email protected]> * Raise ValueError if max_symbols_per_step <= 0 Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Fix test Signed-off-by: Vladimir Bataev <[email protected]> --------- Signed-off-by: Vladimir Bataev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: stevehuang52 <[email protected]>

* Add structure for batched hypotheses Signed-off-by: Vladimir Bataev <[email protected]> * Add faster decoding algo Signed-off-by: Vladimir Bataev <[email protected]> * Simplify max_symbols support. More speedup Signed-off-by: Vladimir Bataev <[email protected]> * Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Filtering only when necessary Signed-off-by: Vladimir Bataev <[email protected]> * Move max_symbols check to the end of loop Signed-off-by: Vladimir Bataev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Support returning prediction network states Signed-off-by: Vladimir Bataev <[email protected]> * Support preserve_alignments flag Signed-off-by: Vladimir Bataev <[email protected]> * Support confidence Signed-off-by: Vladimir Bataev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Partial fix for jit compatibility Signed-off-by: Vladimir Bataev <[email protected]> * Support switching between decoding algorithms Signed-off-by: Vladimir Bataev <[email protected]> * Fix switching algorithms Signed-off-by: Vladimir Bataev <[email protected]> * Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Fix max symbols per step Signed-off-by: Vladimir Bataev <[email protected]> * Add tests. Preserve torch.jit compatibility for BatchedHyps Signed-off-by: Vladimir Bataev <[email protected]> * Separate projection from Joint calculation in decoding Signed-off-by: Vladimir Bataev <[email protected]> * Fix config instantiation Signed-off-by: Vladimir Bataev <[email protected]> * Fix after main merge Signed-off-by: Vladimir Bataev <[email protected]> * Add tests for batched hypotheses Signed-off-by: Vladimir Bataev <[email protected]> * Speedup alignments Signed-off-by: Vladimir Bataev <[email protected]> * Test alignments Signed-off-by: Vladimir Bataev <[email protected]> * Fix alignments Signed-off-by: Vladimir Bataev <[email protected]> * Fix tests for alignments Signed-off-by: Vladimir Bataev <[email protected]> * Add more tests Signed-off-by: Vladimir Bataev <[email protected]> * Fix confidence tests Signed-off-by: Vladimir Bataev <[email protected]> * Avoid common package modification Signed-off-by: Vladimir Bataev <[email protected]> * Support Stateless prediction network Signed-off-by: Vladimir Bataev <[email protected]> * Improve stateless decoder support. Separate alignments and confidence Signed-off-by: Vladimir Bataev <[email protected]> * Fix alignments for max_symbols_per_step Signed-off-by: Vladimir Bataev <[email protected]> * Fix alignments for max_symbols_per_step=0 Signed-off-by: Vladimir Bataev <[email protected]> * Fix tests Signed-off-by: Vladimir Bataev <[email protected]> * Fix test Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Batched Hyps/Alignments: lengths -> current_lengths Signed-off-by: Vladimir Bataev <[email protected]> * Simplify indexing Signed-off-by: Vladimir Bataev <[email protected]> * Improve type annotations Signed-off-by: Vladimir Bataev <[email protected]> * Rework test for greedy decoding Signed-off-by: Vladimir Bataev <[email protected]> * Document loop_labels Signed-off-by: Vladimir Bataev <[email protected]> * Raise ValueError if max_symbols_per_step <= 0 Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Fix test Signed-off-by: Vladimir Bataev <[email protected]> --------- Signed-off-by: Vladimir Bataev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]>

* Add structure for batched hypotheses Signed-off-by: Vladimir Bataev <[email protected]> * Add faster decoding algo Signed-off-by: Vladimir Bataev <[email protected]> * Simplify max_symbols support. More speedup Signed-off-by: Vladimir Bataev <[email protected]> * Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Filtering only when necessary Signed-off-by: Vladimir Bataev <[email protected]> * Move max_symbols check to the end of loop Signed-off-by: Vladimir Bataev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Support returning prediction network states Signed-off-by: Vladimir Bataev <[email protected]> * Support preserve_alignments flag Signed-off-by: Vladimir Bataev <[email protected]> * Support confidence Signed-off-by: Vladimir Bataev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Partial fix for jit compatibility Signed-off-by: Vladimir Bataev <[email protected]> * Support switching between decoding algorithms Signed-off-by: Vladimir Bataev <[email protected]> * Fix switching algorithms Signed-off-by: Vladimir Bataev <[email protected]> * Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Fix max symbols per step Signed-off-by: Vladimir Bataev <[email protected]> * Add tests. Preserve torch.jit compatibility for BatchedHyps Signed-off-by: Vladimir Bataev <[email protected]> * Separate projection from Joint calculation in decoding Signed-off-by: Vladimir Bataev <[email protected]> * Fix config instantiation Signed-off-by: Vladimir Bataev <[email protected]> * Fix after main merge Signed-off-by: Vladimir Bataev <[email protected]> * Add tests for batched hypotheses Signed-off-by: Vladimir Bataev <[email protected]> * Speedup alignments Signed-off-by: Vladimir Bataev <[email protected]> * Test alignments Signed-off-by: Vladimir Bataev <[email protected]> * Fix alignments Signed-off-by: Vladimir Bataev <[email protected]> * Fix tests for alignments Signed-off-by: Vladimir Bataev <[email protected]> * Add more tests Signed-off-by: Vladimir Bataev <[email protected]> * Fix confidence tests Signed-off-by: Vladimir Bataev <[email protected]> * Avoid common package modification Signed-off-by: Vladimir Bataev <[email protected]> * Support Stateless prediction network Signed-off-by: Vladimir Bataev <[email protected]> * Improve stateless decoder support. Separate alignments and confidence Signed-off-by: Vladimir Bataev <[email protected]> * Fix alignments for max_symbols_per_step Signed-off-by: Vladimir Bataev <[email protected]> * Fix alignments for max_symbols_per_step=0 Signed-off-by: Vladimir Bataev <[email protected]> * Fix tests Signed-off-by: Vladimir Bataev <[email protected]> * Fix test Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Batched Hyps/Alignments: lengths -> current_lengths Signed-off-by: Vladimir Bataev <[email protected]> * Simplify indexing Signed-off-by: Vladimir Bataev <[email protected]> * Improve type annotations Signed-off-by: Vladimir Bataev <[email protected]> * Rework test for greedy decoding Signed-off-by: Vladimir Bataev <[email protected]> * Document loop_labels Signed-off-by: Vladimir Bataev <[email protected]> * Raise ValueError if max_symbols_per_step <= 0 Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Fix test Signed-off-by: Vladimir Bataev <[email protected]> --------- Signed-off-by: Vladimir Bataev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Pablo Garay <[email protected]>

* Add structure for batched hypotheses Signed-off-by: Vladimir Bataev <[email protected]> * Add faster decoding algo Signed-off-by: Vladimir Bataev <[email protected]> * Simplify max_symbols support. More speedup Signed-off-by: Vladimir Bataev <[email protected]> * Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Filtering only when necessary Signed-off-by: Vladimir Bataev <[email protected]> * Move max_symbols check to the end of loop Signed-off-by: Vladimir Bataev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Support returning prediction network states Signed-off-by: Vladimir Bataev <[email protected]> * Support preserve_alignments flag Signed-off-by: Vladimir Bataev <[email protected]> * Support confidence Signed-off-by: Vladimir Bataev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Partial fix for jit compatibility Signed-off-by: Vladimir Bataev <[email protected]> * Support switching between decoding algorithms Signed-off-by: Vladimir Bataev <[email protected]> * Fix switching algorithms Signed-off-by: Vladimir Bataev <[email protected]> * Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Fix max symbols per step Signed-off-by: Vladimir Bataev <[email protected]> * Add tests. Preserve torch.jit compatibility for BatchedHyps Signed-off-by: Vladimir Bataev <[email protected]> * Separate projection from Joint calculation in decoding Signed-off-by: Vladimir Bataev <[email protected]> * Fix config instantiation Signed-off-by: Vladimir Bataev <[email protected]> * Fix after main merge Signed-off-by: Vladimir Bataev <[email protected]> * Add tests for batched hypotheses Signed-off-by: Vladimir Bataev <[email protected]> * Speedup alignments Signed-off-by: Vladimir Bataev <[email protected]> * Test alignments Signed-off-by: Vladimir Bataev <[email protected]> * Fix alignments Signed-off-by: Vladimir Bataev <[email protected]> * Fix tests for alignments Signed-off-by: Vladimir Bataev <[email protected]> * Add more tests Signed-off-by: Vladimir Bataev <[email protected]> * Fix confidence tests Signed-off-by: Vladimir Bataev <[email protected]> * Avoid common package modification Signed-off-by: Vladimir Bataev <[email protected]> * Support Stateless prediction network Signed-off-by: Vladimir Bataev <[email protected]> * Improve stateless decoder support. Separate alignments and confidence Signed-off-by: Vladimir Bataev <[email protected]> * Fix alignments for max_symbols_per_step Signed-off-by: Vladimir Bataev <[email protected]> * Fix alignments for max_symbols_per_step=0 Signed-off-by: Vladimir Bataev <[email protected]> * Fix tests Signed-off-by: Vladimir Bataev <[email protected]> * Fix test Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Batched Hyps/Alignments: lengths -> current_lengths Signed-off-by: Vladimir Bataev <[email protected]> * Simplify indexing Signed-off-by: Vladimir Bataev <[email protected]> * Improve type annotations Signed-off-by: Vladimir Bataev <[email protected]> * Rework test for greedy decoding Signed-off-by: Vladimir Bataev <[email protected]> * Document loop_labels Signed-off-by: Vladimir Bataev <[email protected]> * Raise ValueError if max_symbols_per_step <= 0 Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Fix test Signed-off-by: Vladimir Bataev <[email protected]> --------- Signed-off-by: Vladimir Bataev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

artbataev added 2 commits November 21, 2023 22:53

Add structure for batched hypotheses

9342489

Signed-off-by: Vladimir Bataev <[email protected]>

Add faster decoding algo

7bcc4c0

Signed-off-by: Vladimir Bataev <[email protected]>

github-actions bot added the ASR label Nov 21, 2023

artbataev and others added 11 commits November 22, 2023 15:27

Simplify max_symbols support. More speedup

7a0942f

Signed-off-by: Vladimir Bataev <[email protected]>

Clean up

26ec40c

Signed-off-by: Vladimir Bataev <[email protected]>

Clean up

1d556ea

Signed-off-by: Vladimir Bataev <[email protected]>

Filtering only when necessary

cf631dd

Signed-off-by: Vladimir Bataev <[email protected]>

Move max_symbols check to the end of loop

a50965d

Signed-off-by: Vladimir Bataev <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

510eb90

for more information, see https://pre-commit.ci

Support returning prediction network states

659cfff

Signed-off-by: Vladimir Bataev <[email protected]>

Support preserve_alignments flag

40d1568

Signed-off-by: Vladimir Bataev <[email protected]>

Support confidence

ca2d94b

Signed-off-by: Vladimir Bataev <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

b328fac

for more information, see https://pre-commit.ci

Partial fix for jit compatibility

7997bd6

Signed-off-by: Vladimir Bataev <[email protected]>

github-actions bot added the common label Nov 23, 2023

artbataev added 5 commits November 23, 2023 23:17

Merge branch 'main' into speedup_rnnt_greedy_decoding

6f7746b

Support switching between decoding algorithms

95da9d1

Signed-off-by: Vladimir Bataev <[email protected]>

Fix switching algorithms

ef35381

Signed-off-by: Vladimir Bataev <[email protected]>

Clean up

ca5779d

Signed-off-by: Vladimir Bataev <[email protected]>

Clean up

97092ff

Signed-off-by: Vladimir Bataev <[email protected]>

github-advanced-security bot found potential problems Nov 23, 2023

View reviewed changes

nemo/collections/asr/metrics/rnnt_wer.py Fixed Show fixed Hide fixed

artbataev added 2 commits November 24, 2023 00:59

Fix max symbols per step

c9785ff

Signed-off-by: Vladimir Bataev <[email protected]>

Add tests. Preserve torch.jit compatibility for BatchedHyps

1e09979

Signed-off-by: Vladimir Bataev <[email protected]>

github-actions bot added the stale label Dec 9, 2023

Separate projection from Joint calculation in decoding

f4b7b68

Signed-off-by: Vladimir Bataev <[email protected]>

artbataev removed the stale label Dec 13, 2023

Fix config instantiation

d67b14b

Signed-off-by: Vladimir Bataev <[email protected]>

GNroy previously approved these changes Jan 12, 2024

View reviewed changes

artbataev added 2 commits January 15, 2024 17:43

Improve type annotations

3df991a

Signed-off-by: Vladimir Bataev <[email protected]>

Rework test for greedy decoding

31649fa

Signed-off-by: Vladimir Bataev <[email protected]>

artbataev dismissed GNroy’s stale review via 31649fa January 15, 2024 13:49

artbataev requested a review from GNroy January 15, 2024 13:54

GNroy previously approved these changes Jan 15, 2024

View reviewed changes

titu1994 requested changes Jan 15, 2024

View reviewed changes

artbataev added 3 commits January 16, 2024 18:47

Document loop_labels

5f67c66

Signed-off-by: Vladimir Bataev <[email protected]>

Raise ValueError if max_symbols_per_step <= 0

df86b17

Signed-off-by: Vladimir Bataev <[email protected]>

Add comments

0f4463b

Signed-off-by: Vladimir Bataev <[email protected]>

artbataev dismissed GNroy’s stale review via 0f4463b January 16, 2024 16:09

Fix test

c38f222

Signed-off-by: Vladimir Bataev <[email protected]>

titu1994 approved these changes Jan 16, 2024

View reviewed changes

artbataev merged commit 410f092 into main Jan 16, 2024
15 checks passed

artbataev deleted the speedup_rnnt_greedy_decoding branch January 16, 2024 20:31

This was referenced Jan 19, 2024

RNN-T and TDT Greedy Decoding with TorchScript #8203

Closed

Add loop_labels algorithm for TDT greedy decoding #8215

Merged

artbataev mentioned this pull request Jan 31, 2024

"Loop labels" greedy decoding: faster implementation #8286

Merged

8 tasks

artbataev mentioned this pull request Apr 5, 2024

Use Label-Looping algorithm for RNN-T decoding by default #8831

Merged

8 tasks



		class BatchedHyps:
		"""Class to store batched hypotheses (labels, time_indices, scores) for efficient RNNT decoding"""

Speedup RNN-T greedy decoding #7926

Speedup RNN-T greedy decoding #7926

Conversation

artbataev commented Nov 21, 2023 • edited Loading

What does this PR do ?

Changelog

Usage

Before your PR is "Ready for review"

Who can review?

Additional Information

github-actions bot commented Dec 9, 2023

artbataev commented Dec 14, 2023

github-actions bot commented Dec 29, 2023

artbataev commented Jan 12, 2024

GNroy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

artbataev commented Jan 15, 2024

GNroy left a comment

Choose a reason for hiding this comment

titu1994 left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

artbataev Jan 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

artbataev commented Jan 16, 2024

artbataev commented Jan 16, 2024

titu1994 left a comment

Choose a reason for hiding this comment

artbataev commented Jan 16, 2024

Main points:

1) separating projections in Joint from other operations is helpful in many cases.

2) The immediate projection of encoder output is a tiny overhead

3) in-place unsqueeze_ does not save memory (no overhead after separating projections)

4) Implementation of separation of projections.

artbataev commented Nov 21, 2023 •

edited

Loading

titu1994 left a comment •

edited

Loading

artbataev Jan 16, 2024 •

edited

Loading

3) in-place `unsqueeze_` does not save memory (no overhead after separating projections)