sherpa triton VS sherpa-onnx #711

pehonnet · 2025-02-13T15:34:16Z

pehonnet
Feb 13, 2025

Hi all,

I would like to get some recommendation regarding the use of sherpa/triton VS sherpa-onnx.
My understanding is that sherpa triton could support either onnx or tensor RT format, and here: https://github.com/k2-fsa/sherpa/tree/master/triton#benchmark-for-conformer-trt-encoder-vs-onnx it looks like tensor RT may be faster than onnx.
However, I have tried to follow several different documentations to use sherpa/triton but did not succeed so far (either because models were not available, or could not be converted to onnx because of some updates in torch, or even with older versions of torch I had different issues, some scripts which don't exist anymore, etc.).
On the other hand, I have the impression that sherpa-onnx has more active support, is easier to use and has more up to date documentation.
So I am wondering if you could recommend what is the best between sherpa/triton and sherpa-onnx? My main use case would probably be streaming ASR with a websocket server.

Another question I have, since I could not confirm it with experiments on my laptop, is it possible to run a websocket server using streaming models (for example streaming zipformer) on GPU? (I could not run it because probably my cuda setup is not really well done to compile sherpa-onnx properly). If so, is it possible both with sherpa/triton+tensorRT and onnx? Or only sherpa-onnx?

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sherpa triton VS sherpa-onnx #711

{{title}}

Replies: 0 comments

Select a reply

sherpa triton VS sherpa-onnx #711

pehonnet Feb 13, 2025

Replies: 0 comments

pehonnet
Feb 13, 2025