Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive Input Values in streaming zipformer encoder After Conversion to onnx #679

Open
renadnasser1 opened this issue Dec 10, 2024 · 4 comments

Comments

@renadnasser1
Copy link

Hello @csukuangfj,

First, thank you for all your hard work on icefall and sherpa—they've been incredible resources!
We encountered an issue after converting a trained checkpoint for a streaming Zipformer-based ASR model to ONNX format using the conversion script: export-onnx-streaming.py. The conversion script successfully generated a 3 onnx files (encoder, decoder and joiner). however, the encoder generated with 99 input_values, including (x, x_lens).
During deployment to Triton, we faced the following challenge:
We needed to write the config.pbtxt file. To streamline this, we referred to the scripts available in sherpa/triton/scripts for building configs. Unfortunately, there doesn't appear to be a script specifically for a streaming Zipformer-based model.

To proceed, we used the sherpa/triton/model_repo_streaming_zipformer directory as a reference for all components (feature_extractor, encoder, decoder, joiner, scorer). However, when running Triton, the model configuration expects 2 input_values, while the ONNX model provides 99 input_values.

Could you clarify the following:

  1. Is there an existing script to generate the config.pbtxt for a streaming Zipformer-based model?
  2. If not, could you provide guidance or share a sample configuration that matches the expected input/output structure for this model?

Your insights would be immensely helpful, and I'd be happy to provide additional details if needed.
Thanks in advance for your support!

@csukuangfj
Copy link
Collaborator

@yuekaizhang could you have a look at this issue?


the model configuration expects 2 input_values, while the ONNX model provides 99 input_values.

I think it is easy to use a script to update the config.pbtxt when exporting the model to onnx to include the inputs for model states.

@yuekaizhang
Copy link
Collaborator

@renadnasser1
I'm sorry for the confusion onnx models.
The ONNX used by sherpa-onnx and sherpa/triton may not always be compatible, primarily due to minor differences in input and output shapes.
For zipformer streaming, we currently do not have a one-click Triton deployment example, but you can refer to the deployment scripts for conformer streaming or pruned stateless 7 streaming. Please check the scripts under sherpa/triton/scripts that have _streaming.sh in their names. Basically, we manually wrapped the state tensor as seen in this line: https://github.com/k2-fsa/sherpa/blob/master/triton/scripts/export_onnx.py#L274.
Another approach, as @csukuangfj suggested, is to manually modify the config.pbtxt using a script. If you understand how to use the Triton ONNX backend, either of these methods will work.
If you have any questions, please feel free to ask me.

@vasistalodagala
Copy link

Hi @yuekaizhang and @csukuangfj ,

The model I've built is a streaming zipformer model. The goal is to deploy it using triton.

Following are the methods I've used to export it into the .onnx format:

  1. Using: https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/zipformer/export-onnx-streaming.py
  2. Using: https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/zipformer/export-onnx.py
  3. Using: https://github.com/k2-fsa/sherpa/blob/master/triton/scripts/export_onnx.py

The outcomes of these trials are:

  1. The exported .onnx models could be used successfully for inference using https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/zipformer/onnx_pretrained-streaming.py
  2. The export to .onnx failed when using the --causal True. And when --causal was set to False, while the export worked fine, the inference gives empty output when using https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/zipformer/onnx_pretrained.py. This should be because the streaming model expects some cached input which isn't available. Also, this is expected behaviour anyways from my understanding.
  3. The export to .onnx fails due to mismatch in the kind of classes from the scaling.py file used in zipformer and pruned_transducer_stateless3.
  • Could you please provide with the exact way to export to .onnx for the streaming zipformer which can then be used in the triton deployment?
  • Also, request you to provide with the config.pbtxt file for triton deployment of streaming zipformer model. Generating the configuration for streaming zipformer from the scripts under sherpa/triton/scripts that have _streaming.sh wasn't quite intuitive/direct. Thanks.

@yuekaizhang
Copy link
Collaborator

  • Could you please provide with the exact way to export to .onnx for the streaming zipformer which can then be used in the triton deployment?

You may try build_librispeech_pruned_transducer_stateless7_streaming.sh first, since it is a similar model comparing with streaming zipformer. In #681, it could work.

  • Also, request you to provide with the config.pbtxt file for triton deployment of streaming zipformer model. Generating the configuration for streaming zipformer from the scripts under sherpa/triton/scripts that have _streaming.sh wasn't quite intuitive/direct. Thanks.

Sorry, I have no slot support recently; I would be very grateful if someone could contribute build_librispeech_zipformer_streaming.sh.
My suggestion is to run build_librispeech_pruned_transducer_stateless7_streaming.sh, and then modify it accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants