-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Troubles with streaming decode #1724
Comments
would you mind posting the complete decoding command? Also, could you share part of the decoding results? We want to know the error patterns, e.g., more deletions or insertions? |
Hi, The complete decoding command for chunking mode is like as follows: When I use decode.py instead, the results are quite fine. But with this command the part of decoding results looks like as follows: %WER = 98.58 PER-UTT DETAILS: corr or (ref->hyp) Mostly, the recognition result is empty... Hope this helps |
Could you also show the same info about |
This is a result for modified beam search with beam=4, but results of a greedy decoding are quite similar: %WER = 25.40 PER-UTT DETAILS: corr or (ref->hyp) |
Would you mind sharing the complete decoding command? |
Please also post the first few lines of decoding logs, both for streaming_decode.py and decode.py The first few lines contain complete information about your environment and the command you are running. |
This is a log for streaming_decode run:
|
This is a log for decode.py:
|
Could you change the |
I tried different values but without any substantial effect. |
One of the values in
except -1 |
This is a log of the run with chunk size 32:
|
This is a recognition result:
|
It seems I've found a reason: I use my own fbank features which are slightly different from torchaudio fbanks, but streaming_decode.py loads raw audio and computes fbanks on-the-fly. I've checked this hypothesis by just replacing fbank computation with a call to cut.load_features() and this worked almost the same as simulated streaming does! It would be great to add some flag into streaming_decode.py to check if pre-computed features are used in cuts |
It's great to hear that you fix it. By the way, I suggest you follow what we are doing in Icefall to extract features. It can save you a lot of time when you want to deploy your model with Sherpa. |
Hi,
I am trying to apply librispeech streaming recipe for my data (about 4K hours). I train zipformer with
--causal 1
for about 30 epochs and tried to run both simulated_streaming (via decode.py) and chunking (via streaming_decode.py) with my testsets.While decode.py shows reasonable results (WER of about 30%), chunking version doesn't work properly showing multiple deletions and WER of over 98%).
It is worth noting that at earlier epochs WER is lower (about 80-85%) but later increases up to almost 100%.
Non-streaming version of the model (trained without --causal 1) also works well.
Please suggest what could be the source of such behaviour
The text was updated successfully, but these errors were encountered: