Streaming conformer CTC export #5837

messiaen · 2023-01-23T06:38:59Z

What does this PR do ?

Work in progress:

Export unified (encoder and decoder) onnx model for cache-aware conformer CTC models.
- truncate encoded_output and caches where needed for easier deployment
Use consistent chunk size etc so that first chunk is treated like other chunk whenever possible. This simplifies deployment.

Collection: ASR

Changelog

Add specific line by line info of high level changes in this PR.

Usage

Export conformer-ctc with cache-aware streaming
python scripts/export.py --max-batch 32 --check-tolerance 1.0 --runtime-check --streaming_support --cache_support <path/to/model.nemo> <outut.onnx>

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

titu1994 · 2023-01-23T10:38:42Z

Could you add the PR details?

Test onnx streaming conformer ctc WER Constant att cache width with len param Remove some extra functions in cache_aware runner transpose cache so that batch is first for trt Signed-off-by: Greg Clark <[email protected]>

Signed-off-by: Greg Clark <[email protected]>

…-perf

Signed-off-by: Boris Fomitchev <[email protected]>

Test onnx streaming conformer ctc WER Constant att cache width with len param Remove some extra functions in cache_aware runner transpose cache so that batch is first for trt Signed-off-by: Greg Clark <[email protected]>

Signed-off-by: Greg Clark <[email protected]>

…-perf

Signed-off-by: Boris Fomitchev <[email protected]>

github-actions · 2023-02-25T02:00:22Z

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

Signed-off-by: Greg Clark <[email protected]>

examples/asr/asr_cache_aware_streaming/speech_to_text_cache_aware_streaming_infer.py

nemo/collections/asr/modules/conformer_encoder.py

Signed-off-by: Boris Fomitchev <[email protected]>

Signed-off-by: Greg Clark <[email protected]>

examples/asr/asr_cache_aware_streaming/speech_to_text_cache_aware_streaming_infer.py

Signed-off-by: Greg Clark <[email protected]>

messiaen · 2023-03-12T04:17:54Z

@VahidooX Can you take another look when you get a chance?

VahidooX · 2023-03-01T20:45:58Z

nemo/collections/asr/modules/conformer_encoder.py


+        if length is None or isinstance(self.pre_encode, nn.Linear):


Let's move back if possible.

VahidooX · 2023-03-13T22:33:55Z

nemo/collections/asr/parts/utils/streaming_utils.py

-                    device=audio_chunk.device,
-                    dtype=audio_chunk.dtype,
-                )
+                if self.pad_and_drop_preencoded:


How about saving the value of self.streaming_cfg.pre_encode_cache_size[X] in a variable and then use it to reduce the length of duplication?

VahidooX · 2023-03-13T22:34:49Z

nemo/collections/asr/parts/utils/streaming_utils.py

@@ -1346,7 +1351,10 @@ def __iter__(self):
                )

            if self.buffer_idx == 0 and isinstance(self.streaming_cfg.shift_size, list):
-                shift_size = self.streaming_cfg.shift_size[0]
+                if self.pad_and_drop_preencoded:


How about adding the self.pad_and_drop_preencoded condition to the main if block if makes it easier to read?

VahidooX · 2023-03-13T22:35:48Z

nemo/collections/asr/parts/submodules/multi_head_attention.py

-            cache_next[self._cache_id, :, -q_keep_size:, :] = q_input[:, :q_keep_size, :]
-
+            key = value = torch.cat([cache[self._cache_id], key], dim=1)
+            # query.shape[1] is constant, should save it at init()


Need to keep the comment?

VahidooX · 2023-03-13T22:37:30Z

nemo/collections/asr/parts/submodules/multi_head_attention.py

-            keep_in_cache_next(cache=cache, cache_next=cache_next, keep_size=q_keep_size, cache_id=self._cache_id)
-            cache_next[self._cache_id, :, -q_keep_size:, :] = q_input[:, :q_keep_size, :]
-
+            key = value = torch.cat([cache[self._cache_id], key], dim=1)


Let's add "cache is not None and cache_next is not None" to the if block or separate cache and cache_next like before. Both ways are OK.

VahidooX · 2023-03-13T22:37:46Z

nemo/collections/asr/parts/submodules/causal_convs.py

+            # todo: we should know input_x.size(-1) at config time
+            cache_keep_size = x.size(-1) - self.cache_drop_size
+            cache_next[self._cache_id, :, :, :-cache_keep_size] = cache[self._cache_id, :, :, cache_keep_size:]
+            # print("self._max_cache_len:", self._max_cache_len, "cache: size", cache.size(), "x:", x.size(), " new_x:", new_x.size(), ", cache_keep_size:", cache_keep_size)


Need to drop the comment?

VahidooX · 2023-03-13T22:38:29Z

nemo/collections/asr/parts/submodules/causal_convs.py

@@ -140,23 +131,16 @@ def __init__(

    def update_cache(self, x, cache=None, cache_next=None):
        if cache is None:
-            x = F.pad(x, pad=(self._left_padding, self._right_padding))
+            new_x = F.pad(x, pad=(self._left_padding, self._right_padding))


Let's add "cache is not None and cache_next is not None" to the if block or separate cache and cache_next like before. Both ways are OK.

examples/asr/asr_cache_aware_streaming/speech_to_text_cache_aware_streaming_infer.py

VahidooX

Let's make any needed update in another PR.

nemo/collections/asr/modules/conformer_encoder.py

+    def forward_for_export(
+        self, audio_signal, length, cache_last_channel=None, cache_last_time=None, cache_last_channel_len=None
+    ):


nemo/collections/asr/modules/conformer_encoder.py

+                rets[4],
+            )
+
+    def streaming_post_process(self, rets, keep_all_outputs=True):


nemo/collections/asr/modules/conformer_encoder.py

+    def forward(
+        self, audio_signal, length, cache_last_channel=None, cache_last_time=None, cache_last_channel_len=None
+    ):


nemo/collections/asr/modules/conformer_encoder.py

+    def forward_internal(
+        self, audio_signal, length, cache_last_channel=None, cache_last_time=None, cache_last_channel_len=None
+    ):


* cache-aware streaming export Test onnx streaming conformer ctc WER Constant att cache width with len param Remove some extra functions in cache_aware runner transpose cache so that batch is first for trt Signed-off-by: Greg Clark <[email protected]> * fix export for full-context conformer * WIP trying to improve onnx perf Signed-off-by: Greg Clark <[email protected]> * Adding test scripts Signed-off-by: Greg Clark <[email protected]> * More perf testing script Signed-off-by: Greg Clark <[email protected]> * Updates for jit torch_tensorrt tracing Signed-off-by: Greg Clark <[email protected]> * Fixed trace warnings Signed-off-by: Boris Fomitchev <[email protected]> * Rearranging tests Signed-off-by: Boris Fomitchev <[email protected]> * Fixing non-caching case Signed-off-by: Boris Fomitchev <[email protected]> * testing Signed-off-by: Boris Fomitchev <[email protected]> * Fixed channel cache length issue Signed-off-by: Boris Fomitchev <[email protected]> * cache-aware streaming export Test onnx streaming conformer ctc WER Constant att cache width with len param Remove some extra functions in cache_aware runner transpose cache so that batch is first for trt Signed-off-by: Greg Clark <[email protected]> * fix export for full-context conformer * WIP trying to improve onnx perf Signed-off-by: Greg Clark <[email protected]> * Adding test scripts Signed-off-by: Greg Clark <[email protected]> * More perf testing script Signed-off-by: Greg Clark <[email protected]> * Updates for jit torch_tensorrt tracing Signed-off-by: Greg Clark <[email protected]> * stash Signed-off-by: Boris Fomitchev <[email protected]> * Reverting non-essential changes Signed-off-by: Boris Fomitchev <[email protected]> * Offset=None case Signed-off-by: Boris Fomitchev <[email protected]> * Remove test scripts Signed-off-by: Greg Clark <[email protected]> * Clean up speech_to_text_cache_aware_streaming_infer Signed-off-by: Greg Clark <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert pad -> constant_pad_nd Signed-off-by: Greg Clark <[email protected]> * conformer-encoder set window_size from streaming_cfg Signed-off-by: Greg Clark <[email protected]> * Fixes for working export(), using more constants Signed-off-by: Boris Fomitchev <[email protected]> * Optional rand init for cahce Signed-off-by: Greg Clark <[email protected]> * Folding update_cache with constants Signed-off-by: Boris Fomitchev <[email protected]> * More folding Signed-off-by: Boris Fomitchev <[email protected]> * Reducing diff #1 Signed-off-by: Boris Fomitchev <[email protected]> * Reducing diff NVIDIA#2 Signed-off-by: Boris Fomitchev <[email protected]> * Reducing diff NVIDIA#3 Signed-off-by: Boris Fomitchev <[email protected]> * Fixed unit tests, more reverts Signed-off-by: Boris Fomitchev <[email protected]> * Export fixes Signed-off-by: Boris Fomitchev <[email protected]> * Reverted slice changes that ruined ONNX perf Signed-off-by: Boris Fomitchev <[email protected]> * Adding back keep_all_outputs and drop_extra_preencoded Signed-off-by: Greg Clark <[email protected]> * Fix export Signed-off-by: Greg Clark <[email protected]> --------- Signed-off-by: Greg Clark <[email protected]> Signed-off-by: Boris Fomitchev <[email protected]> Co-authored-by: Boris Fomitchev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Vahid Noroozi <[email protected]>

* cache-aware streaming export Test onnx streaming conformer ctc WER Constant att cache width with len param Remove some extra functions in cache_aware runner transpose cache so that batch is first for trt Signed-off-by: Greg Clark <[email protected]> * fix export for full-context conformer * WIP trying to improve onnx perf Signed-off-by: Greg Clark <[email protected]> * Adding test scripts Signed-off-by: Greg Clark <[email protected]> * More perf testing script Signed-off-by: Greg Clark <[email protected]> * Updates for jit torch_tensorrt tracing Signed-off-by: Greg Clark <[email protected]> * Fixed trace warnings Signed-off-by: Boris Fomitchev <[email protected]> * Rearranging tests Signed-off-by: Boris Fomitchev <[email protected]> * Fixing non-caching case Signed-off-by: Boris Fomitchev <[email protected]> * testing Signed-off-by: Boris Fomitchev <[email protected]> * Fixed channel cache length issue Signed-off-by: Boris Fomitchev <[email protected]> * cache-aware streaming export Test onnx streaming conformer ctc WER Constant att cache width with len param Remove some extra functions in cache_aware runner transpose cache so that batch is first for trt Signed-off-by: Greg Clark <[email protected]> * fix export for full-context conformer * WIP trying to improve onnx perf Signed-off-by: Greg Clark <[email protected]> * Adding test scripts Signed-off-by: Greg Clark <[email protected]> * More perf testing script Signed-off-by: Greg Clark <[email protected]> * Updates for jit torch_tensorrt tracing Signed-off-by: Greg Clark <[email protected]> * stash Signed-off-by: Boris Fomitchev <[email protected]> * Reverting non-essential changes Signed-off-by: Boris Fomitchev <[email protected]> * Offset=None case Signed-off-by: Boris Fomitchev <[email protected]> * Remove test scripts Signed-off-by: Greg Clark <[email protected]> * Clean up speech_to_text_cache_aware_streaming_infer Signed-off-by: Greg Clark <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert pad -> constant_pad_nd Signed-off-by: Greg Clark <[email protected]> * conformer-encoder set window_size from streaming_cfg Signed-off-by: Greg Clark <[email protected]> * Fixes for working export(), using more constants Signed-off-by: Boris Fomitchev <[email protected]> * Optional rand init for cahce Signed-off-by: Greg Clark <[email protected]> * Folding update_cache with constants Signed-off-by: Boris Fomitchev <[email protected]> * More folding Signed-off-by: Boris Fomitchev <[email protected]> * Reducing diff sam1373#1 Signed-off-by: Boris Fomitchev <[email protected]> * Reducing diff sam1373#2 Signed-off-by: Boris Fomitchev <[email protected]> * Reducing diff sam1373#3 Signed-off-by: Boris Fomitchev <[email protected]> * Fixed unit tests, more reverts Signed-off-by: Boris Fomitchev <[email protected]> * Export fixes Signed-off-by: Boris Fomitchev <[email protected]> * Reverted slice changes that ruined ONNX perf Signed-off-by: Boris Fomitchev <[email protected]> * Adding back keep_all_outputs and drop_extra_preencoded Signed-off-by: Greg Clark <[email protected]> * Fix export Signed-off-by: Greg Clark <[email protected]> --------- Signed-off-by: Greg Clark <[email protected]> Signed-off-by: Boris Fomitchev <[email protected]> Co-authored-by: Boris Fomitchev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Vahid Noroozi <[email protected]> Signed-off-by: hsiehjackson <[email protected]>

github-actions bot added ASR core Changes to NeMo Core labels Jan 23, 2023

messiaen force-pushed the streaming-conformer branch from 7b408ce to accc444 Compare January 23, 2023 06:41

messiaen changed the title ~~Streaming conformer~~ Streaming conformer CTC export Jan 23, 2023

messiaen force-pushed the streaming-conformer branch from f06101a to 24c13ed Compare January 24, 2023 00:58

sknadig mentioned this pull request Jan 26, 2023

Cache-aware Conformer Encoder has high latency with ONNX runtime #5867

Closed

cache-aware streaming export

3482119

Test onnx streaming conformer ctc WER Constant att cache width with len param Remove some extra functions in cache_aware runner transpose cache so that batch is first for trt Signed-off-by: Greg Clark <[email protected]>

messiaen force-pushed the streaming-conformer branch from bd00b57 to 3482119 Compare February 7, 2023 15:48

messiaen and others added 19 commits February 10, 2023 11:13

fix export for full-context conformer

e067d26

WIP trying to improve onnx perf

7aa5861

Signed-off-by: Greg Clark <[email protected]>

Adding test scripts

3178028

Signed-off-by: Greg Clark <[email protected]>

More perf testing script

6fee48a

Signed-off-by: Greg Clark <[email protected]>

Updates for jit torch_tensorrt tracing

27d3fea

Signed-off-by: Greg Clark <[email protected]>

Merge remote-tracking branch 'upstream/main' into streaming-conformer…

121429f

…-perf

Fixed trace warnings

999db65

Signed-off-by: Boris Fomitchev <[email protected]>

Rearranging tests

dd1c54c

Signed-off-by: Boris Fomitchev <[email protected]>

Fixing non-caching case

d87a953

Signed-off-by: Boris Fomitchev <[email protected]>

testing

6781c87

Signed-off-by: Boris Fomitchev <[email protected]>

Fixed channel cache length issue

a4c8570

Signed-off-by: Boris Fomitchev <[email protected]>

cache-aware streaming export

0859c3d

Test onnx streaming conformer ctc WER Constant att cache width with len param Remove some extra functions in cache_aware runner transpose cache so that batch is first for trt Signed-off-by: Greg Clark <[email protected]>

fix export for full-context conformer

a7edfc7

WIP trying to improve onnx perf

6b496b8

Signed-off-by: Greg Clark <[email protected]>

Adding test scripts

c022479

Signed-off-by: Greg Clark <[email protected]>

More perf testing script

9ddcf89

Signed-off-by: Greg Clark <[email protected]>

Updates for jit torch_tensorrt tracing

4da07ed

Signed-off-by: Greg Clark <[email protected]>

Merge remote-tracking branch 'upstream/main' into streaming-conformer…

33acae4

…-perf

stash

31d698c

Signed-off-by: Boris Fomitchev <[email protected]>

github-actions bot added the stale label Feb 25, 2023

Optional rand init for cahce

f8a2677

Signed-off-by: Greg Clark <[email protected]>

VahidooX requested changes Mar 1, 2023

View reviewed changes

borisfom and others added 9 commits March 5, 2023 22:10

Folding update_cache with constants

70ef4b9

Signed-off-by: Boris Fomitchev <[email protected]>

More folding

96071df

Signed-off-by: Boris Fomitchev <[email protected]>

Reducing diff #1

b39407d

Signed-off-by: Boris Fomitchev <[email protected]>

Reducing diff #2

54ef6c8

Signed-off-by: Boris Fomitchev <[email protected]>

Reducing diff #3

ddebb4b

Signed-off-by: Boris Fomitchev <[email protected]>

Fixed unit tests, more reverts

3e326ae

Signed-off-by: Boris Fomitchev <[email protected]>

Export fixes

66fdde2

Signed-off-by: Boris Fomitchev <[email protected]>

Reverted slice changes that ruined ONNX perf

fb2c070

Signed-off-by: Boris Fomitchev <[email protected]>

Merge branch 'main' into streaming-conformer

7630c8e

messiaen force-pushed the streaming-conformer branch from a8b07a6 to 7630c8e Compare March 7, 2023 20:15

messiaen added 2 commits March 7, 2023 21:04

Adding back keep_all_outputs and drop_extra_preencoded

d400042

Signed-off-by: Greg Clark <[email protected]>

Merge branch 'main' into streaming-conformer

62d1e5f

messiaen commented Mar 8, 2023

View reviewed changes

examples/asr/asr_cache_aware_streaming/speech_to_text_cache_aware_streaming_infer.py Show resolved Hide resolved

messiaen added 3 commits March 8, 2023 00:52

Fix export

654d3b9

Signed-off-by: Greg Clark <[email protected]>

Merge branch 'main' into streaming-conformer

74c6422

Merge branch 'main' into streaming-conformer

21f34bc

messiaen requested a review from VahidooX March 12, 2023 04:20

VahidooX requested changes Mar 13, 2023

View reviewed changes

VahidooX self-requested a review March 13, 2023 23:09

VahidooX approved these changes Mar 13, 2023

View reviewed changes

Merge branch 'main' into streaming-conformer

3861622

github-advanced-security bot found potential problems Mar 13, 2023

View reviewed changes

VahidooX merged commit fd085dd into NVIDIA:main Mar 14, 2023

messiaen mentioned this pull request Mar 28, 2023

chore: minor cleanup #6311

Merged

8 tasks

jvcrnc mentioned this pull request Apr 6, 2023

Onnx export of cache aware conformer fails #6381

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming conformer CTC export #5837

Streaming conformer CTC export #5837

messiaen commented Jan 23, 2023 •

edited

Loading

titu1994 commented Jan 23, 2023

github-actions bot commented Feb 25, 2023

messiaen commented Mar 12, 2023

VahidooX Mar 1, 2023

VahidooX Mar 13, 2023

VahidooX Mar 13, 2023

VahidooX Mar 13, 2023

VahidooX Mar 13, 2023

VahidooX Mar 13, 2023

VahidooX Mar 13, 2023

VahidooX left a comment


		if length is None or isinstance(self.pre_encode, nn.Linear):

Streaming conformer CTC export #5837

Streaming conformer CTC export #5837

Conversation

messiaen commented Jan 23, 2023 • edited Loading

What does this PR do ?

Changelog

Usage

Before your PR is "Ready for review"

Who can review?

Additional Information

titu1994 commented Jan 23, 2023

github-actions bot commented Feb 25, 2023

messiaen commented Mar 12, 2023

VahidooX Mar 1, 2023

Choose a reason for hiding this comment

VahidooX Mar 13, 2023

Choose a reason for hiding this comment

VahidooX Mar 13, 2023

Choose a reason for hiding this comment

VahidooX Mar 13, 2023

Choose a reason for hiding this comment

VahidooX Mar 13, 2023

Choose a reason for hiding this comment

VahidooX Mar 13, 2023

Choose a reason for hiding this comment

VahidooX Mar 13, 2023

Choose a reason for hiding this comment

VahidooX left a comment

Choose a reason for hiding this comment

messiaen commented Jan 23, 2023 •

edited

Loading