-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhanced speaker counting for short audio recordings #2729
Conversation
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
This pull request introduces 1 alert when merging a3f70b9 into 132a829 - view on LGTM.com new alerts:
|
Add randomly generated synthetic embeddings to make eigen analysis more stable. | ||
We refer to these embeddings as anchor embeddings. | ||
|
||
anchor_sample_n (int): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add doc string for emb as well. Is it possible to add some best default values for the remaining arguments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added doc string for emb and also added recommended values for the arguments.
emb_dim = emb.shape[1] | ||
mean, std_org = np.mean(emb, axis=0), np.std(emb, axis=0) | ||
new_emb_list = [] | ||
for _ in range(anchor_spk_n): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use torch functions instead of numpy. All these functions can be performed through torch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing np to torch is on hold.
""" | ||
est_num_of_spk_list = [] | ||
for seed in range(random_test_count): | ||
np.random.seed(seed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, move to torch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing np to torch is on hold.
oracle_num_speakers=None, | ||
max_num_speaker=8, | ||
min_samples=6, | ||
enhanced_count_thres=80, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doc string for enhanced count threshold missing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added doc string for enhanced_count_thres
if emb.shape[0] == 1: | ||
return np.array([0]) | ||
elif emb.shape[0] < enhanced_count_thres and oracle_num_speakers == None: | ||
oracle_num_speakers = getEnhancedSpeakerCount(key, emb, cuda) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this oracle_num_speakers and not est_num_speakers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also when comparing to None use is None
oracle_num_speakers is None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Introduced a new variable est_num_of_spk_enhanced to make it clear.
If oracle_num_speakers and est_num_of_speak both doesn't exist, we use est_num_of_spk.
Signed-off-by: Taejin Park <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
…esc_update Signed-off-by: Taejin Park <[email protected]>
* Update enhanced speaker counting for short samples Signed-off-by: Taejin Park <[email protected]> * Update and doc string change Signed-off-by: Taejin Park <[email protected]> * Reflected PR review comments Signed-off-by: Taejin Park <[email protected]> * Ran style fix again to fix it Signed-off-by: Taejin Park <[email protected]> * Ran style fix again to fix it Signed-off-by: Taejin Park <[email protected]> * Ran style fix again to fix it Signed-off-by: Taejin Park <[email protected]> * Ran style fix again to fix it Signed-off-by: Taejin Park <[email protected]> * Update enhanced speaker counting for short samples Signed-off-by: Taejin Park <[email protected]> * Update and doc string change Signed-off-by: Taejin Park <[email protected]> * Reflected PR review comments Signed-off-by: Taejin Park <[email protected]> * Ran style fix again to fix it Signed-off-by: Taejin Park <[email protected]> * Ran style fix again to fix it Signed-off-by: Taejin Park <[email protected]> * Ran style fix again to fix it Signed-off-by: Taejin Park <[email protected]> * Ran style fix again to fix it Signed-off-by: Taejin Park <[email protected]> * Ran style fix again to fix it Signed-off-by: Taejin Park <[email protected]> * Ran style fix again to fix it Signed-off-by: Taejin Park <[email protected]> * Ran style fix again to fix it Signed-off-by: Taejin Park <[email protected]> * Ran style fix again to fix it Signed-off-by: Taejin Park <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Paarth Neekhara <[email protected]>
* Update enhanced speaker counting for short samples Signed-off-by: Taejin Park <[email protected]> * Update and doc string change Signed-off-by: Taejin Park <[email protected]> * Reflected PR review comments Signed-off-by: Taejin Park <[email protected]> * Ran style fix again to fix it Signed-off-by: Taejin Park <[email protected]> * Ran style fix again to fix it Signed-off-by: Taejin Park <[email protected]> * Ran style fix again to fix it Signed-off-by: Taejin Park <[email protected]> * Ran style fix again to fix it Signed-off-by: Taejin Park <[email protected]> * Update enhanced speaker counting for short samples Signed-off-by: Taejin Park <[email protected]> * Update and doc string change Signed-off-by: Taejin Park <[email protected]> * Reflected PR review comments Signed-off-by: Taejin Park <[email protected]> * Ran style fix again to fix it Signed-off-by: Taejin Park <[email protected]> * Ran style fix again to fix it Signed-off-by: Taejin Park <[email protected]> * Ran style fix again to fix it Signed-off-by: Taejin Park <[email protected]> * Ran style fix again to fix it Signed-off-by: Taejin Park <[email protected]> * Ran style fix again to fix it Signed-off-by: Taejin Park <[email protected]> * Ran style fix again to fix it Signed-off-by: Taejin Park <[email protected]> * Ran style fix again to fix it Signed-off-by: Taejin Park <[email protected]> * Ran style fix again to fix it Signed-off-by: Taejin Park <[email protected]> Co-authored-by: Nithin Rao <[email protected]>
This pull request is for adding the function getEnhancedSpeakerCount() that performs an enhanced speaker counting for speaker diarization module.
It improves speaker counting accuracy from 50% to 80% especially for short (less than 1 min) recordings.