Decoding speed and accuracy on the transformed onnx model #42

yangyi0818 · 2022-08-25T08:43:20Z

Hi, thanks for you share of the espnet_onnx system!

I met two problems when I tried to inference thorough your codes. My acoustic model is trained by myself on our own dataset. The AM architecture is the typical Conformer. I downloaded this code on June.

First, the decoding speed is too slow by it. When using torch to decode, the RTF is around 2.32; however it becomes around 20 when using the transformed onnx.

Second, the CER calculated in the torch version is 7.8% while for the onnx, it becomes 10.6%. I think it is probably wrong.

I'm giving some configs here:

export.py

import sys
sys.path.append('espnet-master')
sys.path.append('espnet-master/espnet_tts_frontend-master')
sys.path.append('espnet_onnx-master/espnet_onnx/export/asr')
import torch

from export_asr import ModelExport
from espnet2.bin.asr_inference import Speech2Text

if __name__ == '__main__':
    m = ModelExport(cache_dir = sys.argv[5])

    # export from trained model
    speech2text=Speech2Text(
            asr_train_config = sys.argv[1],
            asr_model_file=sys.argv[2],
            lm_train_config=sys.argv[3],
            lm_file=sys.argv[4],
            )

    m.export(model = speech2text, tag_name = 'speech2text', quantize=True)

And I get an onnx dir structured like:

asr/onnx/speech2text/
      config.yaml
      feats_stats.npz
      full/
      quantize/

The test wav is a filelist, structured as:

bigfar_001_000001 /home/dangfeng/exp_xiandao/for_xiandao/onnx_enh/output_0703/enh/bigfar_001_000001.wav
bigfar_001_000002 /home/dangfeng/exp_xiandao/for_xiandao/onnx_enh/output_0703/enh/bigfar_001_000002.wav
bigfar_001_000003 /home/dangfeng/exp_xiandao/for_xiandao/onnx_enh/output_0703/enh/bigfar_001_000003.wav
bigfar_001_000004 /home/dangfeng/exp_xiandao/for_xiandao/onnx_enh/output_0703/enh/bigfar_001_000004.wav
bigfar_001_000005 /home/dangfeng/exp_xiandao/for_xiandao/onnx_enh/output_0703/enh/bigfar_001_000005.wav
bigfar_001_000006 /home/dangfeng/exp_xiandao/for_xiandao/onnx_enh/output_0703/enh/bigfar_001_000006.wav
...

The decoding process is:

decode.py

import sys
sys.path.append('espnet_onnx-master/espnet_onnx/asr')

import time
import threading
import librosa
import os
from tqdm import tqdm
from asr_model import Speech2Text

if __name__ == '__main__':
    """ step1: load onnx file """
        speech2text = Speech2Text(tag_name = 'speech2text', model_dir=sys.argv[3],)

        """ step2: ASR """
        f = open(sys.argv[1])
        lines = f.readlines()
        for line in tqdm(lines):
            with open(os.path.join(sys.argv[2], 'hyp_flush_1process.trn'),'a') as fout:
                wav_name = line.split(' ')[0].strip()
                processing_wav = line.split(' ')[1].strip()

                start = time.time()
                y, sr = librosa.load(processing_wav, sr=16000)
                nbest = speech2text(y)
                asr_result = nbest[0][0]
                end = time.time()

                for j in range (len(asr_result)):
                    fout.write(asr_result[j])
                    if j != len(asr_result) - 1:
                        fout.write(' ')
                fout.write('\t')
                fout.write('(')
                fout.write(wav_name)
                fout.write('-')
                fout.write(wav_name)
                fout.write(')')
                fout.write('\n')

                print('processing:  ', processing_wav)
                print('Result:         ', asr_result)
                print('Time:           ', end-start, 's')

Furthermore, I noticed that you have mentioned there may be some problems for Conformer AM considering ASR in latest issue, has it been fixed?

Looking forward for your reply!

The text was updated successfully, but these errors were encountered:

Masao-Someki · 2022-08-27T10:57:54Z

Hi @yangyi0818, Thank you for reporting the issue!
About the first point, I would like to know the following information:

What is your device? CPU or GPU?
Am I right that your model was constructed with Conformer encoder and Transformer decoder?
Did you use LM for the inference?
There are two Conformer blocks in ESPnet, the legacy and the latest versions. Which block did you use?
I see quantization is applied to your model. Did you execute your quantized model on GPU?

And about the second point, I would like to know the following information:

Did you check the weights for ctc and decoder? It is defined in asr/onnx/speech2text/config.yaml

The latest Conformer-related issue is not yet fixed, and I'm trying to solve it!

yangyi0818 · 2022-08-28T14:17:57Z

Hi @Masao-Someki ! Thank you for your kind reply!
Here are my answers.

About the first point:

What is your device? CPU or GPU?
CPU

Am I right that your model was constructed with Conformer encoder and Transformer decoder?
Yes.

Did you use LM for the inference?
Yes. It is a transformer structured LM.

There are two Conformer blocks in ESPnet, the legacy and the latest versions. Which block did you use?
Our AM was trained last year, maybe it is a legacy one?

I see quantization is applied to your model. Did you execute your quantized model on GPU?
It is true that I set 'quantize=True' in 'export.py'. But I have only tried the unquantized model on CPU.

About the second point:
Yes , I checked the weights and I also tried different configurations. It seems that it didn't help much. Here are the results:
weights: {ctc: 0.3, decoder: 0.7, length_bonus: 0.0, lm: 0.3} # cer=10.8% (This is the same configuration as inferencing on torch)
weights: {ctc: 0.3, decoder: 0.7, length_bonus: 0.0, lm: 1.0} # cer=10.8%
weights: {ctc: 0.3, decoder: 1.0, length_bonus: 0.0, lm: 0.1} # cer=11.6%
weights: {ctc: 0.5, decoder: 0.5, length_bonus: 0.0, lm: 1.0} # cer=10.7%

Masao-Someki · 2022-08-28T15:03:16Z

Thank you!
About the RTF, it may be a problem with the frontend process.
If you are using the default frontend, which contains stft and logmel, is it possible to check the performance difference between the torch frontend and the onnx frontend?
I recently found a little speed down in espnet_onnx's frontend compared to the ESPnet version. Now I'm considering converting this whole process into onnx. If the frontend causes this problem, I think I have to do this quickly..

rookie0607 · 2022-08-30T02:29:29Z

Hi, thanks for you share of the espnet_onnx system!

I met two problems when I tried to inference thorough your codes. My acoustic model is trained by myself on our own dataset. The AM architecture is the typical Conformer. I downloaded this code on June.

First, the decoding speed is too slow by it. When using torch to decode, the RTF is around 2.32; however it becomes around 20 when using the transformed onnx.

Second, the CER calculated in the torch version is 7.8% while for the onnx, it becomes 10.6%. I think it is probably wrong.

I'm giving some configs here:

export.py

import sys
sys.path.append('espnet-master')
sys.path.append('espnet-master/espnet_tts_frontend-master')
sys.path.append('espnet_onnx-master/espnet_onnx/export/asr')
import torch

from export_asr import ModelExport
from espnet2.bin.asr_inference import Speech2Text

if __name__ == '__main__':
    m = ModelExport(cache_dir = sys.argv[5])

    # export from trained model
    speech2text=Speech2Text(
            asr_train_config = sys.argv[1],
            asr_model_file=sys.argv[2],
            lm_train_config=sys.argv[3],
            lm_file=sys.argv[4],
            )

    m.export(model = speech2text, tag_name = 'speech2text', quantize=True)

And I get an onnx dir structured like:

asr/onnx/speech2text/ config.yaml feats_stats.npz full/ quantize/

The test wav is a filelist, structured as:

bigfar_001_000001 /home/dangfeng/exp_xiandao/for_xiandao/onnx_enh/output_0703/enh/bigfar_001_000001.wav
bigfar_001_000002 /home/dangfeng/exp_xiandao/for_xiandao/onnx_enh/output_0703/enh/bigfar_001_000002.wav
bigfar_001_000003 /home/dangfeng/exp_xiandao/for_xiandao/onnx_enh/output_0703/enh/bigfar_001_000003.wav
bigfar_001_000004 /home/dangfeng/exp_xiandao/for_xiandao/onnx_enh/output_0703/enh/bigfar_001_000004.wav
bigfar_001_000005 /home/dangfeng/exp_xiandao/for_xiandao/onnx_enh/output_0703/enh/bigfar_001_000005.wav
bigfar_001_000006 /home/dangfeng/exp_xiandao/for_xiandao/onnx_enh/output_0703/enh/bigfar_001_000006.wav
...

The decoding process is:

decode.py

import sys
sys.path.append('espnet_onnx-master/espnet_onnx/asr')

import time
import threading
import librosa
import os
from tqdm import tqdm
from asr_model import Speech2Text

if __name__ == '__main__':
    """ step1: load onnx file """
        speech2text = Speech2Text(tag_name = 'speech2text', model_dir=sys.argv[3],)

        """ step2: ASR """
        f = open(sys.argv[1])
        lines = f.readlines()
        for line in tqdm(lines):
            with open(os.path.join(sys.argv[2], 'hyp_flush_1process.trn'),'a') as fout:
                wav_name = line.split(' ')[0].strip()
                processing_wav = line.split(' ')[1].strip()

                start = time.time()
                y, sr = librosa.load(processing_wav, sr=16000)
                nbest = speech2text(y)
                asr_result = nbest[0][0]
                end = time.time()

                for j in range (len(asr_result)):
                    fout.write(asr_result[j])
                    if j != len(asr_result) - 1:
                        fout.write(' ')
                fout.write('\t')
                fout.write('(')
                fout.write(wav_name)
                fout.write('-')
                fout.write(wav_name)
                fout.write(')')
                fout.write('\n')

                print('processing:  ', processing_wav)
                print('Result:         ', asr_result)
                print('Time:           ', end-start, 's')

Furthermore, I noticed that you have mentioned there may be some problems for Conformer AM considering ASR in latest issue, has it been fixed?

Looking forward for your reply!

What is your torch version?

yangyi0818 · 2022-08-30T02:36:03Z

HI @rookie0607
my torch version is 1.7.1 and onnx version is 1.7.0

joazoa · 2022-08-31T09:01:06Z

In relation to the slow speed, can you check how many cores are loaded when you try to inference with onnx as i suspect it could be related?
@Masao-Someki I notice that all cpu cores are in use when i try to do cpu inference. Is there a way to avoid this other than setting tasksel 1 ?
I tried export OMP_NUM_THREADS=1 but no luck.

Masao-Someki · 2022-08-31T12:05:04Z

@joazoa
You can limit the number of threads with the following options:

inter_op_num_threads = 1
intra_op_num_threads = 1

Currently, there is no script to limit the number of threads in espnet_onnx, so you may need to modify inference codes like this:

import onnxruntime as ort

sess_options = ort.SessionOptions()
sess_options.inter_op_num_threads = 1
sess_options.intra_op_num_threads = 1

self.encoder = onnxruntime.InferenceSession(
                self.config.quantized_model_path,
                providers=providers,
                sess_options=sess_options
            )

joazoa · 2022-08-31T12:10:24Z

@Masao-Someki thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decoding speed and accuracy on the transformed onnx model #42

Decoding speed and accuracy on the transformed onnx model #42

yangyi0818 commented Aug 25, 2022

Masao-Someki commented Aug 27, 2022

yangyi0818 commented Aug 28, 2022

Masao-Someki commented Aug 28, 2022

rookie0607 commented Aug 30, 2022

yangyi0818 commented Aug 30, 2022

joazoa commented Aug 31, 2022

Masao-Someki commented Aug 31, 2022

joazoa commented Aug 31, 2022

Decoding speed and accuracy on the transformed onnx model #42

Decoding speed and accuracy on the transformed onnx model #42

Comments

yangyi0818 commented Aug 25, 2022

Masao-Someki commented Aug 27, 2022

yangyi0818 commented Aug 28, 2022

Masao-Someki commented Aug 28, 2022

rookie0607 commented Aug 30, 2022

yangyi0818 commented Aug 30, 2022

joazoa commented Aug 31, 2022

Masao-Someki commented Aug 31, 2022

joazoa commented Aug 31, 2022