Feature request - API to return the raw speech probabilities #274
-
🚀 Feature
MotivationAs a very simple example, I may want to use an iterative approach to selecting the threshold parameters until finding the parameters that produce the desired number of segments per minute. For that, we don't want to keep running But since that is just one very simplified example, I am not proposing a specific API just to implement that one specific example, but rather, direct access to the raw speech probs to allow the flexibility for apps to use the speech probs how they like. PitchHave torch hub return another function that gets the raw speech probabilities. AlternativesThe alternative is to create a fork of silero-vad. Additional context |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
Hi, You can just look up how models are invoked in If you believe that your use case may be useful for a significant proportion of users, please do a PR. Typically users do not want to mess with probabilities. |
Beta Was this translation helpful? Give feedback.
-
You can use wav = read_audio('files/en.wav')
raw_probs = model.audio_forward(wav, sr=16000, num_samples=512)
# tensor([[0.0948, 0.1472, 0.1674, ..., 0.9034, 0.9971, 0.9988]]) |
Beta Was this translation helpful? Give feedback.
You can use
.audio_forward
method for that purpose. Both models got this method in the latest v4 release.