Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming conformer CTC export #5837

Merged
merged 49 commits into from
Mar 14, 2023
Merged

Conversation

messiaen
Copy link
Contributor

@messiaen messiaen commented Jan 23, 2023

What does this PR do ?

Work in progress:

  • Export unified (encoder and decoder) onnx model for cache-aware conformer CTC models.
    • truncate encoded_output and caches where needed for easier deployment
  • Use consistent chunk size etc so that first chunk is treated like other chunk whenever possible. This simplifies deployment.

Collection: ASR

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

Export conformer-ctc with cache-aware streaming
python scripts/export.py --max-batch 32 --check-tolerance 1.0 --runtime-check --streaming_support --cache_support <path/to/model.nemo> <outut.onnx>

# Add a code snippet demonstrating how to use this 

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

@github-actions github-actions bot added ASR core Changes to NeMo Core labels Jan 23, 2023
@titu1994
Copy link
Collaborator

Could you add the PR details?

@messiaen messiaen changed the title Streaming conformer Streaming conformer CTC export Jan 23, 2023
Test onnx streaming conformer ctc WER

Constant att cache width with len param

Remove some extra functions in cache_aware runner

transpose cache so that batch is first for trt

Signed-off-by: Greg Clark <[email protected]>
messiaen and others added 19 commits February 10, 2023 11:13
Signed-off-by: Greg Clark <[email protected]>
Signed-off-by: Greg Clark <[email protected]>
Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: Boris Fomitchev <[email protected]>
Test onnx streaming conformer ctc WER

Constant att cache width with len param

Remove some extra functions in cache_aware runner

transpose cache so that batch is first for trt

Signed-off-by: Greg Clark <[email protected]>
Signed-off-by: Greg Clark <[email protected]>
Signed-off-by: Greg Clark <[email protected]>
Signed-off-by: Boris Fomitchev <[email protected]>
@github-actions
Copy link
Contributor

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

@github-actions github-actions bot added the stale label Feb 25, 2023
borisfom and others added 9 commits March 5, 2023 22:10
Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: Boris Fomitchev <[email protected]>
@messiaen
Copy link
Contributor Author

@VahidooX Can you take another look when you get a chance?

@messiaen messiaen requested a review from VahidooX March 12, 2023 04:20

if length is None or isinstance(self.pre_encode, nn.Linear):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move back if possible.

device=audio_chunk.device,
dtype=audio_chunk.dtype,
)
if self.pad_and_drop_preencoded:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about saving the value of self.streaming_cfg.pre_encode_cache_size[X] in a variable and then use it to reduce the length of duplication?

@@ -1346,7 +1351,10 @@ def __iter__(self):
)

if self.buffer_idx == 0 and isinstance(self.streaming_cfg.shift_size, list):
shift_size = self.streaming_cfg.shift_size[0]
if self.pad_and_drop_preencoded:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding the self.pad_and_drop_preencoded condition to the main if block if makes it easier to read?

cache_next[self._cache_id, :, -q_keep_size:, :] = q_input[:, :q_keep_size, :]

key = value = torch.cat([cache[self._cache_id], key], dim=1)
# query.shape[1] is constant, should save it at init()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to keep the comment?

keep_in_cache_next(cache=cache, cache_next=cache_next, keep_size=q_keep_size, cache_id=self._cache_id)
cache_next[self._cache_id, :, -q_keep_size:, :] = q_input[:, :q_keep_size, :]

key = value = torch.cat([cache[self._cache_id], key], dim=1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add "cache is not None and cache_next is not None" to the if block or separate cache and cache_next like before. Both ways are OK.

# todo: we should know input_x.size(-1) at config time
cache_keep_size = x.size(-1) - self.cache_drop_size
cache_next[self._cache_id, :, :, :-cache_keep_size] = cache[self._cache_id, :, :, cache_keep_size:]
# print("self._max_cache_len:", self._max_cache_len, "cache: size", cache.size(), "x:", x.size(), " new_x:", new_x.size(), ", cache_keep_size:", cache_keep_size)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to drop the comment?

@@ -140,23 +131,16 @@ def __init__(

def update_cache(self, x, cache=None, cache_next=None):
if cache is None:
x = F.pad(x, pad=(self._left_padding, self._right_padding))
new_x = F.pad(x, pad=(self._left_padding, self._right_padding))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add "cache is not None and cache_next is not None" to the if block or separate cache and cache_next like before. Both ways are OK.

@VahidooX VahidooX self-requested a review March 13, 2023 23:09
Copy link
Collaborator

@VahidooX VahidooX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make any needed update in another PR.

Comment on lines +457 to +459
def forward_for_export(
self, audio_signal, length, cache_last_channel=None, cache_last_time=None, cache_last_channel_len=None
):

Check notice

Code scanning / CodeQL

Returning tuples with varying lengths

ConformerEncoder.forward_for_export returns [tuple of size 2](1) and [tuple of size 5](2).
rets[4],
)

def streaming_post_process(self, rets, keep_all_outputs=True):

Check notice

Code scanning / CodeQL

Returning tuples with varying lengths

ConformerEncoder.streaming_post_process returns [tuple of size 2](1) and [tuple of size 5](2).
Comment on lines +502 to +504
def forward(
self, audio_signal, length, cache_last_channel=None, cache_last_time=None, cache_last_channel_len=None
):

Check notice

Code scanning / CodeQL

Returning tuples with varying lengths

ConformerEncoder.forward returns [tuple of size 2](1) and [tuple of size 5](2).
Comment on lines +513 to +515
def forward_internal(
self, audio_signal, length, cache_last_channel=None, cache_last_time=None, cache_last_channel_len=None
):

Check notice

Code scanning / CodeQL

Returning tuples with varying lengths

ConformerEncoder.forward_internal returns [tuple of size 2](1) and [tuple of size 5](2).
@VahidooX VahidooX merged commit fd085dd into NVIDIA:main Mar 14, 2023
titu1994 pushed a commit to titu1994/NeMo that referenced this pull request Mar 24, 2023
* cache-aware streaming export

Test onnx streaming conformer ctc WER

Constant att cache width with len param

Remove some extra functions in cache_aware runner

transpose cache so that batch is first for trt

Signed-off-by: Greg Clark <[email protected]>

* fix export for full-context conformer

* WIP trying to improve onnx perf

Signed-off-by: Greg Clark <[email protected]>

* Adding test scripts

Signed-off-by: Greg Clark <[email protected]>

* More perf testing script

Signed-off-by: Greg Clark <[email protected]>

* Updates for jit torch_tensorrt tracing

Signed-off-by: Greg Clark <[email protected]>

* Fixed trace warnings

Signed-off-by: Boris Fomitchev <[email protected]>

* Rearranging tests

Signed-off-by: Boris Fomitchev <[email protected]>

* Fixing non-caching case

Signed-off-by: Boris Fomitchev <[email protected]>

* testing

Signed-off-by: Boris Fomitchev <[email protected]>

* Fixed channel cache length issue

Signed-off-by: Boris Fomitchev <[email protected]>

* cache-aware streaming export

Test onnx streaming conformer ctc WER

Constant att cache width with len param

Remove some extra functions in cache_aware runner

transpose cache so that batch is first for trt

Signed-off-by: Greg Clark <[email protected]>

* fix export for full-context conformer

* WIP trying to improve onnx perf

Signed-off-by: Greg Clark <[email protected]>

* Adding test scripts

Signed-off-by: Greg Clark <[email protected]>

* More perf testing script

Signed-off-by: Greg Clark <[email protected]>

* Updates for jit torch_tensorrt tracing

Signed-off-by: Greg Clark <[email protected]>

* stash

Signed-off-by: Boris Fomitchev <[email protected]>

* Reverting non-essential changes

Signed-off-by: Boris Fomitchev <[email protected]>

* Offset=None case

Signed-off-by: Boris Fomitchev <[email protected]>

* Remove test scripts

Signed-off-by: Greg Clark <[email protected]>

* Clean up speech_to_text_cache_aware_streaming_infer

Signed-off-by: Greg Clark <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert pad -> constant_pad_nd

Signed-off-by: Greg Clark <[email protected]>

* conformer-encoder set window_size from streaming_cfg

Signed-off-by: Greg Clark <[email protected]>

* Fixes for working export(), using more constants

Signed-off-by: Boris Fomitchev <[email protected]>

* Optional rand init for cahce

Signed-off-by: Greg Clark <[email protected]>

* Folding update_cache with constants

Signed-off-by: Boris Fomitchev <[email protected]>

* More folding

Signed-off-by: Boris Fomitchev <[email protected]>

* Reducing diff #1

Signed-off-by: Boris Fomitchev <[email protected]>

* Reducing diff NVIDIA#2

Signed-off-by: Boris Fomitchev <[email protected]>

* Reducing diff NVIDIA#3

Signed-off-by: Boris Fomitchev <[email protected]>

* Fixed unit tests, more reverts

Signed-off-by: Boris Fomitchev <[email protected]>

* Export fixes

Signed-off-by: Boris Fomitchev <[email protected]>

* Reverted slice changes that ruined ONNX perf

Signed-off-by: Boris Fomitchev <[email protected]>

* Adding back keep_all_outputs and drop_extra_preencoded

Signed-off-by: Greg Clark <[email protected]>

* Fix export

Signed-off-by: Greg Clark <[email protected]>

---------

Signed-off-by: Greg Clark <[email protected]>
Signed-off-by: Boris Fomitchev <[email protected]>
Co-authored-by: Boris Fomitchev <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Vahid Noroozi <[email protected]>
@messiaen messiaen mentioned this pull request Mar 28, 2023
8 tasks
hsiehjackson pushed a commit to hsiehjackson/NeMo that referenced this pull request Jun 2, 2023
* cache-aware streaming export

Test onnx streaming conformer ctc WER

Constant att cache width with len param

Remove some extra functions in cache_aware runner

transpose cache so that batch is first for trt

Signed-off-by: Greg Clark <[email protected]>

* fix export for full-context conformer

* WIP trying to improve onnx perf

Signed-off-by: Greg Clark <[email protected]>

* Adding test scripts

Signed-off-by: Greg Clark <[email protected]>

* More perf testing script

Signed-off-by: Greg Clark <[email protected]>

* Updates for jit torch_tensorrt tracing

Signed-off-by: Greg Clark <[email protected]>

* Fixed trace warnings

Signed-off-by: Boris Fomitchev <[email protected]>

* Rearranging tests

Signed-off-by: Boris Fomitchev <[email protected]>

* Fixing non-caching case

Signed-off-by: Boris Fomitchev <[email protected]>

* testing

Signed-off-by: Boris Fomitchev <[email protected]>

* Fixed channel cache length issue

Signed-off-by: Boris Fomitchev <[email protected]>

* cache-aware streaming export

Test onnx streaming conformer ctc WER

Constant att cache width with len param

Remove some extra functions in cache_aware runner

transpose cache so that batch is first for trt

Signed-off-by: Greg Clark <[email protected]>

* fix export for full-context conformer

* WIP trying to improve onnx perf

Signed-off-by: Greg Clark <[email protected]>

* Adding test scripts

Signed-off-by: Greg Clark <[email protected]>

* More perf testing script

Signed-off-by: Greg Clark <[email protected]>

* Updates for jit torch_tensorrt tracing

Signed-off-by: Greg Clark <[email protected]>

* stash

Signed-off-by: Boris Fomitchev <[email protected]>

* Reverting non-essential changes

Signed-off-by: Boris Fomitchev <[email protected]>

* Offset=None case

Signed-off-by: Boris Fomitchev <[email protected]>

* Remove test scripts

Signed-off-by: Greg Clark <[email protected]>

* Clean up speech_to_text_cache_aware_streaming_infer

Signed-off-by: Greg Clark <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert pad -> constant_pad_nd

Signed-off-by: Greg Clark <[email protected]>

* conformer-encoder set window_size from streaming_cfg

Signed-off-by: Greg Clark <[email protected]>

* Fixes for working export(), using more constants

Signed-off-by: Boris Fomitchev <[email protected]>

* Optional rand init for cahce

Signed-off-by: Greg Clark <[email protected]>

* Folding update_cache with constants

Signed-off-by: Boris Fomitchev <[email protected]>

* More folding

Signed-off-by: Boris Fomitchev <[email protected]>

* Reducing diff sam1373#1

Signed-off-by: Boris Fomitchev <[email protected]>

* Reducing diff sam1373#2

Signed-off-by: Boris Fomitchev <[email protected]>

* Reducing diff sam1373#3

Signed-off-by: Boris Fomitchev <[email protected]>

* Fixed unit tests, more reverts

Signed-off-by: Boris Fomitchev <[email protected]>

* Export fixes

Signed-off-by: Boris Fomitchev <[email protected]>

* Reverted slice changes that ruined ONNX perf

Signed-off-by: Boris Fomitchev <[email protected]>

* Adding back keep_all_outputs and drop_extra_preencoded

Signed-off-by: Greg Clark <[email protected]>

* Fix export

Signed-off-by: Greg Clark <[email protected]>

---------

Signed-off-by: Greg Clark <[email protected]>
Signed-off-by: Boris Fomitchev <[email protected]>
Co-authored-by: Boris Fomitchev <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Vahid Noroozi <[email protected]>
Signed-off-by: hsiehjackson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ASR core Changes to NeMo Core
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants