Replies: 1 comment
-
Hi, Basically we ofc thought about it. But our consensus is that the task is ill-posed:
Some edge case questions:
I believe that it is much easier to build a model that takes the whole audio and classifies it. But in case of rap, songs and just plain everyday life - it will be making judgements like "is this more noise or music?", which is not very scientific (and will result in low accuracy and bad performance). So the only real solution is to build a multi-class model that can predict music and speech at the same time. Noise can be defined as lack of music and speech. This can be done, but we lack resources and focus, we are focused on making our VAD much more simple and accurate. If this is something that you urgently need, we can discuss adding such models commercially as a project. |
Beta Was this translation helpful? Give feedback.
-
🚀 Feature
extent vad to speech, music, noise
Motivation
As music is common in these days, vad for speech and noise is not enough.
Pitch
Can detect speech, music, noise in a audio stream
Beta Was this translation helpful? Give feedback.
All reactions