-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streaming conformer CTC export #5837
Conversation
7b408ce
to
accc444
Compare
Could you add the PR details? |
f06101a
to
24c13ed
Compare
Test onnx streaming conformer ctc WER Constant att cache width with len param Remove some extra functions in cache_aware runner transpose cache so that batch is first for trt Signed-off-by: Greg Clark <[email protected]>
bd00b57
to
3482119
Compare
Signed-off-by: Greg Clark <[email protected]>
Signed-off-by: Greg Clark <[email protected]>
Signed-off-by: Greg Clark <[email protected]>
Signed-off-by: Greg Clark <[email protected]>
Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: Boris Fomitchev <[email protected]>
Test onnx streaming conformer ctc WER Constant att cache width with len param Remove some extra functions in cache_aware runner transpose cache so that batch is first for trt Signed-off-by: Greg Clark <[email protected]>
Signed-off-by: Greg Clark <[email protected]>
Signed-off-by: Greg Clark <[email protected]>
Signed-off-by: Greg Clark <[email protected]>
Signed-off-by: Greg Clark <[email protected]>
Signed-off-by: Boris Fomitchev <[email protected]>
This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days. |
Signed-off-by: Greg Clark <[email protected]>
examples/asr/asr_cache_aware_streaming/speech_to_text_cache_aware_streaming_infer.py
Outdated
Show resolved
Hide resolved
examples/asr/asr_cache_aware_streaming/speech_to_text_cache_aware_streaming_infer.py
Outdated
Show resolved
Hide resolved
examples/asr/asr_cache_aware_streaming/speech_to_text_cache_aware_streaming_infer.py
Outdated
Show resolved
Hide resolved
examples/asr/asr_cache_aware_streaming/speech_to_text_cache_aware_streaming_infer.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: Boris Fomitchev <[email protected]>
a8b07a6
to
7630c8e
Compare
examples/asr/asr_cache_aware_streaming/speech_to_text_cache_aware_streaming_infer.py
Show resolved
Hide resolved
Signed-off-by: Greg Clark <[email protected]>
@VahidooX Can you take another look when you get a chance? |
|
||
if length is None or isinstance(self.pre_encode, nn.Linear): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move back if possible.
device=audio_chunk.device, | ||
dtype=audio_chunk.dtype, | ||
) | ||
if self.pad_and_drop_preencoded: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about saving the value of self.streaming_cfg.pre_encode_cache_size[X] in a variable and then use it to reduce the length of duplication?
@@ -1346,7 +1351,10 @@ def __iter__(self): | |||
) | |||
|
|||
if self.buffer_idx == 0 and isinstance(self.streaming_cfg.shift_size, list): | |||
shift_size = self.streaming_cfg.shift_size[0] | |||
if self.pad_and_drop_preencoded: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding the self.pad_and_drop_preencoded condition to the main if block if makes it easier to read?
cache_next[self._cache_id, :, -q_keep_size:, :] = q_input[:, :q_keep_size, :] | ||
|
||
key = value = torch.cat([cache[self._cache_id], key], dim=1) | ||
# query.shape[1] is constant, should save it at init() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to keep the comment?
keep_in_cache_next(cache=cache, cache_next=cache_next, keep_size=q_keep_size, cache_id=self._cache_id) | ||
cache_next[self._cache_id, :, -q_keep_size:, :] = q_input[:, :q_keep_size, :] | ||
|
||
key = value = torch.cat([cache[self._cache_id], key], dim=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add "cache is not None and cache_next is not None" to the if block or separate cache and cache_next like before. Both ways are OK.
# todo: we should know input_x.size(-1) at config time | ||
cache_keep_size = x.size(-1) - self.cache_drop_size | ||
cache_next[self._cache_id, :, :, :-cache_keep_size] = cache[self._cache_id, :, :, cache_keep_size:] | ||
# print("self._max_cache_len:", self._max_cache_len, "cache: size", cache.size(), "x:", x.size(), " new_x:", new_x.size(), ", cache_keep_size:", cache_keep_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to drop the comment?
@@ -140,23 +131,16 @@ def __init__( | |||
|
|||
def update_cache(self, x, cache=None, cache_next=None): | |||
if cache is None: | |||
x = F.pad(x, pad=(self._left_padding, self._right_padding)) | |||
new_x = F.pad(x, pad=(self._left_padding, self._right_padding)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add "cache is not None and cache_next is not None" to the if block or separate cache and cache_next like before. Both ways are OK.
examples/asr/asr_cache_aware_streaming/speech_to_text_cache_aware_streaming_infer.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make any needed update in another PR.
def forward_for_export( | ||
self, audio_signal, length, cache_last_channel=None, cache_last_time=None, cache_last_channel_len=None | ||
): |
Check notice
Code scanning / CodeQL
Returning tuples with varying lengths
rets[4], | ||
) | ||
|
||
def streaming_post_process(self, rets, keep_all_outputs=True): |
Check notice
Code scanning / CodeQL
Returning tuples with varying lengths
def forward( | ||
self, audio_signal, length, cache_last_channel=None, cache_last_time=None, cache_last_channel_len=None | ||
): |
Check notice
Code scanning / CodeQL
Returning tuples with varying lengths
def forward_internal( | ||
self, audio_signal, length, cache_last_channel=None, cache_last_time=None, cache_last_channel_len=None | ||
): |
Check notice
Code scanning / CodeQL
Returning tuples with varying lengths
* cache-aware streaming export Test onnx streaming conformer ctc WER Constant att cache width with len param Remove some extra functions in cache_aware runner transpose cache so that batch is first for trt Signed-off-by: Greg Clark <[email protected]> * fix export for full-context conformer * WIP trying to improve onnx perf Signed-off-by: Greg Clark <[email protected]> * Adding test scripts Signed-off-by: Greg Clark <[email protected]> * More perf testing script Signed-off-by: Greg Clark <[email protected]> * Updates for jit torch_tensorrt tracing Signed-off-by: Greg Clark <[email protected]> * Fixed trace warnings Signed-off-by: Boris Fomitchev <[email protected]> * Rearranging tests Signed-off-by: Boris Fomitchev <[email protected]> * Fixing non-caching case Signed-off-by: Boris Fomitchev <[email protected]> * testing Signed-off-by: Boris Fomitchev <[email protected]> * Fixed channel cache length issue Signed-off-by: Boris Fomitchev <[email protected]> * cache-aware streaming export Test onnx streaming conformer ctc WER Constant att cache width with len param Remove some extra functions in cache_aware runner transpose cache so that batch is first for trt Signed-off-by: Greg Clark <[email protected]> * fix export for full-context conformer * WIP trying to improve onnx perf Signed-off-by: Greg Clark <[email protected]> * Adding test scripts Signed-off-by: Greg Clark <[email protected]> * More perf testing script Signed-off-by: Greg Clark <[email protected]> * Updates for jit torch_tensorrt tracing Signed-off-by: Greg Clark <[email protected]> * stash Signed-off-by: Boris Fomitchev <[email protected]> * Reverting non-essential changes Signed-off-by: Boris Fomitchev <[email protected]> * Offset=None case Signed-off-by: Boris Fomitchev <[email protected]> * Remove test scripts Signed-off-by: Greg Clark <[email protected]> * Clean up speech_to_text_cache_aware_streaming_infer Signed-off-by: Greg Clark <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert pad -> constant_pad_nd Signed-off-by: Greg Clark <[email protected]> * conformer-encoder set window_size from streaming_cfg Signed-off-by: Greg Clark <[email protected]> * Fixes for working export(), using more constants Signed-off-by: Boris Fomitchev <[email protected]> * Optional rand init for cahce Signed-off-by: Greg Clark <[email protected]> * Folding update_cache with constants Signed-off-by: Boris Fomitchev <[email protected]> * More folding Signed-off-by: Boris Fomitchev <[email protected]> * Reducing diff #1 Signed-off-by: Boris Fomitchev <[email protected]> * Reducing diff NVIDIA#2 Signed-off-by: Boris Fomitchev <[email protected]> * Reducing diff NVIDIA#3 Signed-off-by: Boris Fomitchev <[email protected]> * Fixed unit tests, more reverts Signed-off-by: Boris Fomitchev <[email protected]> * Export fixes Signed-off-by: Boris Fomitchev <[email protected]> * Reverted slice changes that ruined ONNX perf Signed-off-by: Boris Fomitchev <[email protected]> * Adding back keep_all_outputs and drop_extra_preencoded Signed-off-by: Greg Clark <[email protected]> * Fix export Signed-off-by: Greg Clark <[email protected]> --------- Signed-off-by: Greg Clark <[email protected]> Signed-off-by: Boris Fomitchev <[email protected]> Co-authored-by: Boris Fomitchev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Vahid Noroozi <[email protected]>
* cache-aware streaming export Test onnx streaming conformer ctc WER Constant att cache width with len param Remove some extra functions in cache_aware runner transpose cache so that batch is first for trt Signed-off-by: Greg Clark <[email protected]> * fix export for full-context conformer * WIP trying to improve onnx perf Signed-off-by: Greg Clark <[email protected]> * Adding test scripts Signed-off-by: Greg Clark <[email protected]> * More perf testing script Signed-off-by: Greg Clark <[email protected]> * Updates for jit torch_tensorrt tracing Signed-off-by: Greg Clark <[email protected]> * Fixed trace warnings Signed-off-by: Boris Fomitchev <[email protected]> * Rearranging tests Signed-off-by: Boris Fomitchev <[email protected]> * Fixing non-caching case Signed-off-by: Boris Fomitchev <[email protected]> * testing Signed-off-by: Boris Fomitchev <[email protected]> * Fixed channel cache length issue Signed-off-by: Boris Fomitchev <[email protected]> * cache-aware streaming export Test onnx streaming conformer ctc WER Constant att cache width with len param Remove some extra functions in cache_aware runner transpose cache so that batch is first for trt Signed-off-by: Greg Clark <[email protected]> * fix export for full-context conformer * WIP trying to improve onnx perf Signed-off-by: Greg Clark <[email protected]> * Adding test scripts Signed-off-by: Greg Clark <[email protected]> * More perf testing script Signed-off-by: Greg Clark <[email protected]> * Updates for jit torch_tensorrt tracing Signed-off-by: Greg Clark <[email protected]> * stash Signed-off-by: Boris Fomitchev <[email protected]> * Reverting non-essential changes Signed-off-by: Boris Fomitchev <[email protected]> * Offset=None case Signed-off-by: Boris Fomitchev <[email protected]> * Remove test scripts Signed-off-by: Greg Clark <[email protected]> * Clean up speech_to_text_cache_aware_streaming_infer Signed-off-by: Greg Clark <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert pad -> constant_pad_nd Signed-off-by: Greg Clark <[email protected]> * conformer-encoder set window_size from streaming_cfg Signed-off-by: Greg Clark <[email protected]> * Fixes for working export(), using more constants Signed-off-by: Boris Fomitchev <[email protected]> * Optional rand init for cahce Signed-off-by: Greg Clark <[email protected]> * Folding update_cache with constants Signed-off-by: Boris Fomitchev <[email protected]> * More folding Signed-off-by: Boris Fomitchev <[email protected]> * Reducing diff sam1373#1 Signed-off-by: Boris Fomitchev <[email protected]> * Reducing diff sam1373#2 Signed-off-by: Boris Fomitchev <[email protected]> * Reducing diff sam1373#3 Signed-off-by: Boris Fomitchev <[email protected]> * Fixed unit tests, more reverts Signed-off-by: Boris Fomitchev <[email protected]> * Export fixes Signed-off-by: Boris Fomitchev <[email protected]> * Reverted slice changes that ruined ONNX perf Signed-off-by: Boris Fomitchev <[email protected]> * Adding back keep_all_outputs and drop_extra_preencoded Signed-off-by: Greg Clark <[email protected]> * Fix export Signed-off-by: Greg Clark <[email protected]> --------- Signed-off-by: Greg Clark <[email protected]> Signed-off-by: Boris Fomitchev <[email protected]> Co-authored-by: Boris Fomitchev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Vahid Noroozi <[email protected]> Signed-off-by: hsiehjackson <[email protected]>
What does this PR do ?
Work in progress:
Collection: ASR
Changelog
Usage
Export conformer-ctc with cache-aware streaming
python scripts/export.py --max-batch 32 --check-tolerance 1.0 --runtime-check --streaming_support --cache_support <path/to/model.nemo> <outut.onnx>
# Add a code snippet demonstrating how to use this
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information