You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to get some recommendation regarding the use of sherpa/triton VS sherpa-onnx.
My understanding is that sherpa triton could support either onnx or tensor RT format, and here: https://github.com/k2-fsa/sherpa/tree/master/triton#benchmark-for-conformer-trt-encoder-vs-onnx it looks like tensor RT may be faster than onnx.
However, I have tried to follow several different documentations to use sherpa/triton but did not succeed so far (either because models were not available, or could not be converted to onnx because of some updates in torch, or even with older versions of torch I had different issues, some scripts which don't exist anymore, etc.).
On the other hand, I have the impression that sherpa-onnx has more active support, is easier to use and has more up to date documentation.
So I am wondering if you could recommend what is the best between sherpa/triton and sherpa-onnx? My main use case would probably be streaming ASR with a websocket server.
Another question I have, since I could not confirm it with experiments on my laptop, is it possible to run a websocket server using streaming models (for example streaming zipformer) on GPU? (I could not run it because probably my cuda setup is not really well done to compile sherpa-onnx properly). If so, is it possible both with sherpa/triton+tensorRT and onnx? Or only sherpa-onnx?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi all,
I would like to get some recommendation regarding the use of
sherpa/triton
VSsherpa-onnx
.My understanding is that sherpa triton could support either onnx or tensor RT format, and here: https://github.com/k2-fsa/sherpa/tree/master/triton#benchmark-for-conformer-trt-encoder-vs-onnx it looks like tensor RT may be faster than onnx.
However, I have tried to follow several different documentations to use
sherpa/triton
but did not succeed so far (either because models were not available, or could not be converted to onnx because of some updates in torch, or even with older versions of torch I had different issues, some scripts which don't exist anymore, etc.).On the other hand, I have the impression that
sherpa-onnx
has more active support, is easier to use and has more up to date documentation.So I am wondering if you could recommend what is the best between
sherpa/triton
andsherpa-onnx
? My main use case would probably be streaming ASR with a websocket server.Another question I have, since I could not confirm it with experiments on my laptop, is it possible to run a websocket server using streaming models (for example streaming zipformer) on GPU? (I could not run it because probably my cuda setup is not really well done to compile sherpa-onnx properly). If so, is it possible both with sherpa/triton+tensorRT and onnx? Or only sherpa-onnx?
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions