Feature request - [speech, music, noise] #122

deyituo · 2021-11-16T02:32:32Z

deyituo
Nov 16, 2021

🚀 Feature

extent vad to speech, music, noise

Motivation

As music is common in these days, vad for speech and noise is not enough.

Pitch

Can detect speech, music, noise in a audio stream

snakers4 · 2021-11-16T06:09:10Z

snakers4
Nov 16, 2021
Maintainer

Hi,

Basically we ofc thought about it. But our consensus is that the task is ill-posed:

Our VAD works even on small chunks (i.e. 100ms) and we have plans to decrease them even further;
Music detection would work on "long" audios (can you tell from a 2s clip if it is a song, or just noise or speech?);
Noise is hard to define. Some music is literally just curated noise;

Some edge case questions:

When a parrot speaks is it speech?
Is grindcore or noisecore noise or music?
When a TV hisses in the background, is it noise or speech?
A sound of the crowd, market or street - is it noise or speech?
Is a person singing with music speech? Is person just singing speech? Is a person rapping speech?
If you have a classic Pink Floyd song, is the part where music plays music, but when Gilmore sings - is it speech?
You can go on and invent many more;

I believe that it is much easier to build a model that takes the whole audio and classifies it. But in case of rap, songs and just plain everyday life - it will be making judgements like "is this more noise or music?", which is not very scientific (and will result in low accuracy and bad performance).

So the only real solution is to build a multi-class model that can predict music and speech at the same time. Noise can be defined as lack of music and speech. This can be done, but we lack resources and focus, we are focused on making our VAD much more simple and accurate.

If this is something that you urgently need, we can discuss adding such models commercially as a project.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature request - [speech, music, noise] #122

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Feature request - [speech, music, noise] #122

Uh oh!

deyituo Nov 16, 2021

🚀 Feature

Motivation

Pitch

Replies: 1 comment

Uh oh!

Uh oh!

snakers4 Nov 16, 2021 Maintainer

deyituo
Nov 16, 2021

snakers4
Nov 16, 2021
Maintainer