Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Streaming conformer CTC export (#5837)
* cache-aware streaming export Test onnx streaming conformer ctc WER Constant att cache width with len param Remove some extra functions in cache_aware runner transpose cache so that batch is first for trt Signed-off-by: Greg Clark <grclark@nvidia.com> * fix export for full-context conformer * WIP trying to improve onnx perf Signed-off-by: Greg Clark <grclark@nvidia.com> * Adding test scripts Signed-off-by: Greg Clark <grclark@nvidia.com> * More perf testing script Signed-off-by: Greg Clark <grclark@nvidia.com> * Updates for jit torch_tensorrt tracing Signed-off-by: Greg Clark <grclark@nvidia.com> * Fixed trace warnings Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> * Rearranging tests Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> * Fixing non-caching case Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> * testing Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> * Fixed channel cache length issue Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> * cache-aware streaming export Test onnx streaming conformer ctc WER Constant att cache width with len param Remove some extra functions in cache_aware runner transpose cache so that batch is first for trt Signed-off-by: Greg Clark <grclark@nvidia.com> * fix export for full-context conformer * WIP trying to improve onnx perf Signed-off-by: Greg Clark <grclark@nvidia.com> * Adding test scripts Signed-off-by: Greg Clark <grclark@nvidia.com> * More perf testing script Signed-off-by: Greg Clark <grclark@nvidia.com> * Updates for jit torch_tensorrt tracing Signed-off-by: Greg Clark <grclark@nvidia.com> * stash Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> * Reverting non-essential changes Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> * Offset=None case Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> * Remove test scripts Signed-off-by: Greg Clark <grclark@nvidia.com> * Clean up speech_to_text_cache_aware_streaming_infer Signed-off-by: Greg Clark <grclark@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert pad -> constant_pad_nd Signed-off-by: Greg Clark <grclark@nvidia.com> * conformer-encoder set window_size from streaming_cfg Signed-off-by: Greg Clark <grclark@nvidia.com> * Fixes for working export(), using more constants Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> * Optional rand init for cahce Signed-off-by: Greg Clark <grclark@nvidia.com> * Folding update_cache with constants Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> * More folding Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> * Reducing diff #1 Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> * Reducing diff #2 Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> * Reducing diff #3 Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> * Fixed unit tests, more reverts Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> * Export fixes Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> * Reverted slice changes that ruined ONNX perf Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> * Adding back keep_all_outputs and drop_extra_preencoded Signed-off-by: Greg Clark <grclark@nvidia.com> * Fix export Signed-off-by: Greg Clark <grclark@nvidia.com> --------- Signed-off-by: Greg Clark <grclark@nvidia.com> Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> Co-authored-by: Boris Fomitchev <bfomitchev@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Vahid Noroozi <VahidooX@users.noreply.github.com> Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>