Script for ASR inference on long files #2373

jbalam-nv · 2021-06-18T23:00:24Z

speech_to_text_buffered_infer.py supports inference on long audio files by running inference on smaller chunks of audio and them merging the tokens for final transcription.
This version only supports EncDecCTCModelBPE with AudioMelSpectrogramProcessor as the preprocessor
Feature normalization is done on each buffer using "per_feature" normalization, future versions will have other methods for experimentation

Signed-off-by: jbalam-nv <[email protected]>

lgtm-com · 2021-06-18T23:10:51Z

This pull request introduces 2 alerts when merging 4351694 into e070e04 - view on LGTM.com

new alerts:

1 for Unused local variable
1 for Unused import

lgtm-com · 2021-07-08T21:20:44Z

This pull request introduces 2 alerts when merging e00359a into e5bde15 - view on LGTM.com

new alerts:

1 for Unused local variable
1 for Unused import

Signed-off-by: jbalam <[email protected]>

titu1994

Overall looks great, minor comments

titu1994 · 2021-07-13T23:02:19Z

examples/asr/speech_to_text_buffered_infer.py

+    # Create a preprocessor to convert audio samples into raw features,
+    # Normalization will be done per buffer in frame_bufferer
+    # Do not normalize whatever the model's preprocessor setting is
+    preprocessor_cfg.normalize = "None"


String None?

Hack to get through without any normalization. None throws an error here

NeMo/nemo/collections/asr/parts/preprocessing/features.py

Line 78 in bee43e8

elif "fixed_mean" in normalize_type and "fixed_std" in normalize_type:

, as we are looking for a key in a dict. We should probably clean up this normalize_batch function and add a "no_norm" option.

Ah ok. no_norm sounds like a good option

titu1994 · 2021-07-13T23:10:06Z

nemo/collections/asr/parts/utils/streaming_utils.py

+          frame_overlap: duration of overlaps before and after current frame, seconds
+          offset: number of symbols to drop for smooth streaming
+        '''
+        self.ZERO_LEVEL_SPEC_DB_VAL = -16.635


Maybe some info can be added on how this was calculated?

Signed-off-by: jbalam-nv <[email protected]>

Signed-off-by: jbalam <[email protected]>

* First version of script for buffered inference Signed-off-by: jbalam-nv <[email protected]> * Cleaned up commented code and added comments Signed-off-by: jbalam-nv <[email protected]> * More clean up and simplified the call to transcribe Signed-off-by: jbalam-nv <[email protected]> * Removed unused variables Signed-off-by: jbalam <[email protected]> * Style fix Signed-off-by: jbalam <[email protected]> * Added a comment for zero_level_spec_db constant Signed-off-by: jbalam-nv <[email protected]> * style fix Signed-off-by: jbalam <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]>

* First version of script for buffered inference Signed-off-by: jbalam-nv <[email protected]> * Cleaned up commented code and added comments Signed-off-by: jbalam-nv <[email protected]> * More clean up and simplified the call to transcribe Signed-off-by: jbalam-nv <[email protected]> * Removed unused variables Signed-off-by: jbalam <[email protected]> * Style fix Signed-off-by: jbalam <[email protected]> * Added a comment for zero_level_spec_db constant Signed-off-by: jbalam-nv <[email protected]> * style fix Signed-off-by: jbalam <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Paarth Neekhara <[email protected]>

jbalam-nv added 4 commits June 17, 2021 12:21

First version of script for buffered inference

6804b9b

Signed-off-by: jbalam-nv <[email protected]>

Cleaned up commented code and added comments

bf63949

Signed-off-by: jbalam-nv <[email protected]>

More clean up and simplified the call to transcribe

c5826a2

Signed-off-by: jbalam-nv <[email protected]>

Merge branch 'main' into streaming_asr

4351694

Merge branch 'main' into streaming_asr

e00359a

jbalam-nv closed this Jul 13, 2021

jbalam-nv added 2 commits July 13, 2021 14:21

Merge branch 'main' into streaming_asr

710aa7b

Removed unused variables

399708c

Signed-off-by: jbalam <[email protected]>

jbalam-nv reopened this Jul 13, 2021

jbalam-nv assigned titu1994 Jul 13, 2021

Style fix

a741534

Signed-off-by: jbalam <[email protected]>

titu1994 approved these changes Jul 13, 2021

View reviewed changes

titu1994 and others added 5 commits July 14, 2021 09:52

Merge branch 'main' into streaming_asr

1440c59

Merge branch 'main' into streaming_asr

779c485

Added a comment for zero_level_spec_db constant

4692079

Signed-off-by: jbalam-nv <[email protected]>

style fix

00050bb

Signed-off-by: jbalam <[email protected]>

Merge branch 'main' into streaming_asr

467b207

titu1994 merged commit 1d29828 into main Jul 14, 2021

blisc deleted the streaming_asr branch January 11, 2022 16:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Script for ASR inference on long files #2373

Script for ASR inference on long files #2373

jbalam-nv commented Jun 18, 2021

lgtm-com bot commented Jun 18, 2021

lgtm-com bot commented Jul 8, 2021

titu1994 left a comment

titu1994 Jul 13, 2021

jbalam-nv Jul 14, 2021

titu1994 Jul 14, 2021

titu1994 Jul 13, 2021

Script for ASR inference on long files #2373

Script for ASR inference on long files #2373

Conversation

jbalam-nv commented Jun 18, 2021

lgtm-com bot commented Jun 18, 2021

lgtm-com bot commented Jul 8, 2021

titu1994 left a comment

Choose a reason for hiding this comment

titu1994 Jul 13, 2021

Choose a reason for hiding this comment

jbalam-nv Jul 14, 2021

Choose a reason for hiding this comment

titu1994 Jul 14, 2021

Choose a reason for hiding this comment

titu1994 Jul 13, 2021

Choose a reason for hiding this comment