Feature request - API to return the raw speech probabilities #274

ryanheise · 2022-11-21T02:27:47Z

ryanheise
Nov 21, 2022

🚀 Feature

get_speech_timestamps may be useful function, but an API to get the raw speech probabilities is needed for more advanced scenarios.

Motivation

As a very simple example, I may want to use an iterative approach to selecting the threshold parameters until finding the parameters that produce the desired number of segments per minute. For that, we don't want to keep running get_speech_timestamps multiple times to regenerate the speech probabilities, we'd want to generate the speech probabilities once and then do multiple passes on that.

But since that is just one very simplified example, I am not proposing a specific API just to implement that one specific example, but rather, direct access to the raw speech probs to allow the flexibility for apps to use the speech probs how they like.

Pitch

Have torch hub return another function that gets the raw speech probabilities.

Alternatives

The alternative is to create a fork of silero-vad.

Additional context

Answered by adamnsandle

Nov 21, 2022

You can use .audio_forward method for that purpose. Both models got this method in the latest v4 release.

wav = read_audio('files/en.wav')
raw_probs = model.audio_forward(wav, sr=16000, num_samples=512)
# tensor([[0.0948, 0.1472, 0.1674,  ..., 0.9034, 0.9971, 0.9988]])

View full answer

snakers4 · 2022-11-21T02:56:04Z

snakers4
Nov 21, 2022
Maintainer

Hi,

You can just look up how models are invoked in get_speech_timestamps and use them accordingly.

If you believe that your use case may be useful for a significant proportion of users, please do a PR.

Typically users do not want to mess with probabilities.

0 replies

adamnsandle · 2022-11-21T08:23:59Z

adamnsandle
Nov 21, 2022
Collaborator

You can use .audio_forward method for that purpose. Both models got this method in the latest v4 release.

wav = read_audio('files/en.wav')
raw_probs = model.audio_forward(wav, sr=16000, num_samples=512)
# tensor([[0.0948, 0.1472, 0.1674,  ..., 0.9034, 0.9971, 0.9988]])

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request - API to return the raw speech probabilities #274

{{title}}

Replies: 2 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Feature request - API to return the raw speech probabilities #274

ryanheise Nov 21, 2022

🚀 Feature

Motivation

Pitch

Alternatives

Additional context

Replies: 2 comments

snakers4 Nov 21, 2022 Maintainer

adamnsandle Nov 21, 2022 Collaborator

ryanheise
Nov 21, 2022

snakers4
Nov 21, 2022
Maintainer

adamnsandle
Nov 21, 2022
Collaborator