Tuning the v4 model hyper-params #268
-
Is the v4-model more muted? The same speech segment (including a few vocals) has output through the v3- model, but there is no output after the v4-model, speech_timestamps is an empty list. |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments
-
Please provide your audio files and probability graphs ( |
Beta Was this translation helpful? Give feedback.
-
The audio path is https://github.com/JJ-Guo1996/AMR-code/blob/main/audio.wav |
Beta Was this translation helpful? Give feedback.
-
How about your test result? |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Why with the same parameter settings(v3 with min_speech_duration_ms=250), v3 can get output but v4 has no output? Did v4 do any optimizations? |
Beta Was this translation helpful? Give feedback.
v4 was trained not to respond to background voice
As i can see, v4 finds speech in you example:
you may need to tune
min_speech_duration_ms
parameter inget_speech_timestamps
(default value is 250ms):