Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhanced speaker counting for short audio recordings #2729

Merged
merged 21 commits into from
Aug 30, 2021
Merged

Conversation

tango4j
Copy link
Collaborator

@tango4j tango4j commented Aug 25, 2021

This pull request is for adding the function getEnhancedSpeakerCount() that performs an enhanced speaker counting for speaker diarization module.

It improves speaker counting accuracy from 50% to 80% especially for short (less than 1 min) recordings.

@lgtm-com
Copy link

lgtm-com bot commented Aug 25, 2021

This pull request introduces 1 alert when merging a3f70b9 into 132a829 - view on LGTM.com

new alerts:

  • 1 for Testing equality to None

Add randomly generated synthetic embeddings to make eigen analysis more stable.
We refer to these embeddings as anchor embeddings.

anchor_sample_n (int):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add doc string for emb as well. Is it possible to add some best default values for the remaining arguments

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added doc string for emb and also added recommended values for the arguments.

emb_dim = emb.shape[1]
mean, std_org = np.mean(emb, axis=0), np.std(emb, axis=0)
new_emb_list = []
for _ in range(anchor_spk_n):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use torch functions instead of numpy. All these functions can be performed through torch

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing np to torch is on hold.

"""
est_num_of_spk_list = []
for seed in range(random_test_count):
np.random.seed(seed)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, move to torch

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing np to torch is on hold.

oracle_num_speakers=None,
max_num_speaker=8,
min_samples=6,
enhanced_count_thres=80,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc string for enhanced count threshold missing

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added doc string for enhanced_count_thres

if emb.shape[0] == 1:
return np.array([0])
elif emb.shape[0] < enhanced_count_thres and oracle_num_speakers == None:
oracle_num_speakers = getEnhancedSpeakerCount(key, emb, cuda)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this oracle_num_speakers and not est_num_speakers

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also when comparing to None use is None
oracle_num_speakers is None

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduced a new variable est_num_of_spk_enhanced to make it clear.
If oracle_num_speakers and est_num_of_speak both doesn't exist, we use est_num_of_spk.

Copy link
Collaborator

@nithinraok nithinraok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks

Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
@nithinraok nithinraok merged commit b2ace62 into main Aug 30, 2021
@nithinraok nithinraok deleted the nmesc_update branch August 30, 2021 23:56
paarthneekhara pushed a commit to paarthneekhara/NeMo that referenced this pull request Sep 17, 2021
* Update enhanced speaker counting for short samples

Signed-off-by: Taejin Park <[email protected]>

* Update and doc string change

Signed-off-by: Taejin Park <[email protected]>

* Reflected PR review comments

Signed-off-by: Taejin Park <[email protected]>

* Ran style fix again to fix it

Signed-off-by: Taejin Park <[email protected]>

* Ran style fix again to fix it

Signed-off-by: Taejin Park <[email protected]>

* Ran style fix again to fix it

Signed-off-by: Taejin Park <[email protected]>

* Ran style fix again to fix it

Signed-off-by: Taejin Park <[email protected]>

* Update enhanced speaker counting for short samples

Signed-off-by: Taejin Park <[email protected]>

* Update and doc string change

Signed-off-by: Taejin Park <[email protected]>

* Reflected PR review comments

Signed-off-by: Taejin Park <[email protected]>

* Ran style fix again to fix it

Signed-off-by: Taejin Park <[email protected]>

* Ran style fix again to fix it

Signed-off-by: Taejin Park <[email protected]>

* Ran style fix again to fix it

Signed-off-by: Taejin Park <[email protected]>

* Ran style fix again to fix it

Signed-off-by: Taejin Park <[email protected]>

* Ran style fix again to fix it

Signed-off-by: Taejin Park <[email protected]>

* Ran style fix again to fix it

Signed-off-by: Taejin Park <[email protected]>

* Ran style fix again to fix it

Signed-off-by: Taejin Park <[email protected]>

* Ran style fix again to fix it

Signed-off-by: Taejin Park <[email protected]>

Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Paarth Neekhara <[email protected]>
jfsantos pushed a commit to jfsantos/NeMo that referenced this pull request Nov 19, 2021
* Update enhanced speaker counting for short samples

Signed-off-by: Taejin Park <[email protected]>

* Update and doc string change

Signed-off-by: Taejin Park <[email protected]>

* Reflected PR review comments

Signed-off-by: Taejin Park <[email protected]>

* Ran style fix again to fix it

Signed-off-by: Taejin Park <[email protected]>

* Ran style fix again to fix it

Signed-off-by: Taejin Park <[email protected]>

* Ran style fix again to fix it

Signed-off-by: Taejin Park <[email protected]>

* Ran style fix again to fix it

Signed-off-by: Taejin Park <[email protected]>

* Update enhanced speaker counting for short samples

Signed-off-by: Taejin Park <[email protected]>

* Update and doc string change

Signed-off-by: Taejin Park <[email protected]>

* Reflected PR review comments

Signed-off-by: Taejin Park <[email protected]>

* Ran style fix again to fix it

Signed-off-by: Taejin Park <[email protected]>

* Ran style fix again to fix it

Signed-off-by: Taejin Park <[email protected]>

* Ran style fix again to fix it

Signed-off-by: Taejin Park <[email protected]>

* Ran style fix again to fix it

Signed-off-by: Taejin Park <[email protected]>

* Ran style fix again to fix it

Signed-off-by: Taejin Park <[email protected]>

* Ran style fix again to fix it

Signed-off-by: Taejin Park <[email protected]>

* Ran style fix again to fix it

Signed-off-by: Taejin Park <[email protected]>

* Ran style fix again to fix it

Signed-off-by: Taejin Park <[email protected]>

Co-authored-by: Nithin Rao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants