-
I have seen the minor accuracy impact of 8KHz vs 16kHz. I wanted to know the impact of sample rate on RTF. My audio is orginally in 8KHz, would resampling allow to gain some accuracy or given that data is lost, I will be processing 2x samples without any gain in accuracy? Also, why the onnx model based inference which runs faster compared to pytorch does not support 8KHz sample rate. Any plan to support that? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
As it should be according to our tests.
There should be very little impact, because the JIT model actually contains separate, but very similar models for 8 and 16 kHz (for different scenarios different model splits were optimal, but for 30ms+ models we ended up with this).
You may try, but most likely no, the information does not just appear. It may be estimated or hallucinated, but probably it will not help either. In any case the difference in quality is not that big, really.
The fact that it runs faster is due to ONNX itself or its compiler ot static graph, idk. I personally observed this only for very small models on very short inputs (i.e. exactly like this one). For longer inputs on STT models this difference was not meaningful or stable. When we are exporting an ONNX model via So since we decided to have only one model in future to avoid chaos and complexity, we opted for 16 kHz. Since all of these sampling rates (8, 16, 32, 48) are multiples of each other, simple resampling strategies can be half-assed (e.g. for 48 kHz just average each group of 3 values). |
Beta Was this translation helpful? Give feedback.
As it should be according to our tests.
There should be very little impact, because the JIT model actually contains separate, but very similar models for 8 and 16 kHz (for different scenarios different model splits were optimal, but for 30ms+ models we ended up with this).
You may try, but most likely no, the information does not just appear. It may be estimated or hallucinated, but probably it will not help either.
…