New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Enhanced speaker counting for short audio recordings #2729

Merged

nithinraok merged 21 commits into main from nmesc_update

Aug 30, 2021

Collaborator

tango4j commented Aug 25, 2021 •

edited

Loading

This pull request is for adding the function getEnhancedSpeakerCount() that performs an enhanced speaker counting for speaker diarization module.

It improves speaker counting accuracy from 50% to 80% especially for short (less than 1 min) recordings.

tango4j added 2 commits

August 20, 2021 17:59


          Update enhanced speaker counting for short samples

29f6750

Signed-off-by: Taejin Park <[email protected]>


          Update and doc string change

a3f70b9

Signed-off-by: Taejin Park <[email protected]>

tango4j requested a review from nithinraok

August 25, 2021 22:57

lgtm-com bot commented Aug 25, 2021

This pull request introduces 1 alert when merging a3f70b9 into 132a829 - view on LGTM.com

new alerts:

1 for Testing equality to None

nithinraok requested changes

View reviewed changes

nemo/collections/asr/parts/utils/nmse_clustering.py

+                  Add randomly generated synthetic embeddings to make eigen analysis more stable.
+                  We refer to these embeddings as anchor embeddings.
+                  anchor_sample_n (int):

Collaborator

nithinraok Aug 26, 2021

add doc string for emb as well. Is it possible to add some best default values for the remaining arguments

Collaborator Author

tango4j Aug 27, 2021

Added doc string for emb and also added recommended values for the arguments.

nemo/collections/asr/parts/utils/nmse_clustering.py

+                  emb_dim = emb.shape[1]
+                  mean, std_org = np.mean(emb, axis=0), np.std(emb, axis=0)
+                  new_emb_list = []
+                  for _ in range(anchor_spk_n):

Collaborator

nithinraok Aug 26, 2021

Can we use torch functions instead of numpy. All these functions can be performed through torch

Collaborator Author

tango4j Aug 27, 2021

Changing np to torch is on hold.

nemo/collections/asr/parts/utils/nmse_clustering.py

+                  """
+                  est_num_of_spk_list = []
+                  for seed in range(random_test_count):
+                      np.random.seed(seed)

Collaborator

nithinraok Aug 26, 2021

same here, move to torch

Collaborator Author

tango4j Aug 27, 2021

Changing np to torch is on hold.

nemo/collections/asr/parts/utils/nmse_clustering.py

+                  oracle_num_speakers=None,
+                  max_num_speaker=8,
+                  min_samples=6,
+                  enhanced_count_thres=80,

Collaborator

nithinraok Aug 26, 2021

doc string for enhanced count threshold missing

Collaborator Author

tango4j Aug 27, 2021

Added doc string for enhanced_count_thres

nemo/collections/asr/parts/utils/nmse_clustering.py Outdated

+                  if emb.shape[0] == 1:
+                      return np.array([0])
+                  elif emb.shape[0] < enhanced_count_thres and oracle_num_speakers == None:
+                      oracle_num_speakers = getEnhancedSpeakerCount(key, emb, cuda)

Collaborator

nithinraok Aug 26, 2021

why is this oracle_num_speakers and not est_num_speakers

Collaborator

nithinraok Aug 26, 2021

also when comparing to None use is None
oracle_num_speakers is None

Collaborator Author

tango4j Aug 27, 2021

Introduced a new variable est_num_of_spk_enhanced to make it clear.
If oracle_num_speakers and est_num_of_speak both doesn't exist, we use est_num_of_spk.

tango4j and others added 2 commits

August 27, 2021 17:26


          Reflected PR review comments

45fbba5

Signed-off-by: Taejin Park <[email protected]>


          Merge branch 'main' into nmesc_update

ffdc0f3

nithinraok approved these changes

View reviewed changes

Collaborator

nithinraok left a comment

Looks good to me, thanks

tango4j added 17 commits

August 30, 2021 14:30


          Ran style fix again to fix it

0959b5c

Signed-off-by: Taejin Park <[email protected]>


          Ran style fix again to fix it

4fdffe0

Signed-off-by: Taejin Park <[email protected]>


          Merge branch 'nmesc_update' of https://github.com/NVIDIA/NeMo into nm…

d0e0d58

…esc_update


          Ran style fix again to fix it

cb6a147

Signed-off-by: Taejin Park <[email protected]>


          Ran style fix again to fix it

54c25ab

Signed-off-by: Taejin Park <[email protected]>


          Update enhanced speaker counting for short samples

b72e703

Signed-off-by: Taejin Park <[email protected]>


          Update and doc string change

a57605d

Signed-off-by: Taejin Park <[email protected]>


          Reflected PR review comments

456abb8

Signed-off-by: Taejin Park <[email protected]>


          Ran style fix again to fix it

7dcfa50

Signed-off-by: Taejin Park <[email protected]>


          Ran style fix again to fix it

31647d1

Signed-off-by: Taejin Park <[email protected]>


          Ran style fix again to fix it

adefaf5

Signed-off-by: Taejin Park <[email protected]>


          Ran style fix again to fix it

4474b43

Signed-off-by: Taejin Park <[email protected]>


          Ran style fix again to fix it

14f7d83

Signed-off-by: Taejin Park <[email protected]>


          Ran style fix again to fix it

1e61209

Signed-off-by: Taejin Park <[email protected]>


          Ran style fix again to fix it

f28f3bb

Signed-off-by: Taejin Park <[email protected]>


          Ran style fix again to fix it

2aead89

Signed-off-by: Taejin Park <[email protected]>


          Merge branch 'nmesc_update' of https://github.com/NVIDIA/NeMo into nm…

4483cc7

…esc_update

Signed-off-by: Taejin Park <[email protected]>

tango4j force-pushed the nmesc_update branch from 2e51cfe to 4483cc7 Compare

August 30, 2021 22:58

nithinraok merged commit b2ace62 into main

nithinraok deleted the nmesc_update branch

August 30, 2021 23:56

paarthneekhara pushed a commit to paarthneekhara/NeMo that referenced this pull request


          Enhanced speaker counting for short audio recordings (NVIDIA#2729)

7c8a7da

* Update enhanced speaker counting for short samples

Signed-off-by: Taejin Park <[email protected]>

* Update and doc string change

Signed-off-by: Taejin Park <[email protected]>

* Reflected PR review comments

Signed-off-by: Taejin Park <[email protected]>

* Ran style fix again to fix it

Signed-off-by: Taejin Park <[email protected]>

* Ran style fix again to fix it

Signed-off-by: Taejin Park <[email protected]>

* Ran style fix again to fix it

Signed-off-by: Taejin Park <[email protected]>

* Ran style fix again to fix it

Signed-off-by: Taejin Park <[email protected]>

* Update enhanced speaker counting for short samples

Signed-off-by: Taejin Park <[email protected]>

* Update and doc string change

Signed-off-by: Taejin Park <[email protected]>

* Reflected PR review comments

Signed-off-by: Taejin Park <[email protected]>

* Ran style fix again to fix it

Signed-off-by: Taejin Park <[email protected]>

* Ran style fix again to fix it

Signed-off-by: Taejin Park <[email protected]>

* Ran style fix again to fix it

Signed-off-by: Taejin Park <[email protected]>

* Ran style fix again to fix it

Signed-off-by: Taejin Park <[email protected]>

* Ran style fix again to fix it

Signed-off-by: Taejin Park <[email protected]>

* Ran style fix again to fix it

Signed-off-by: Taejin Park <[email protected]>

* Ran style fix again to fix it

Signed-off-by: Taejin Park <[email protected]>

* Ran style fix again to fix it

Signed-off-by: Taejin Park <[email protected]>

Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Paarth Neekhara <[email protected]>

jfsantos pushed a commit to jfsantos/NeMo that referenced this pull request


          Enhanced speaker counting for short audio recordings (NVIDIA#2729)

6dab5e7

* Update enhanced speaker counting for short samples

Signed-off-by: Taejin Park <[email protected]>

* Update and doc string change

Signed-off-by: Taejin Park <[email protected]>

* Reflected PR review comments

Signed-off-by: Taejin Park <[email protected]>

* Ran style fix again to fix it

Signed-off-by: Taejin Park <[email protected]>

* Ran style fix again to fix it

Signed-off-by: Taejin Park <[email protected]>

* Ran style fix again to fix it

Signed-off-by: Taejin Park <[email protected]>

* Ran style fix again to fix it

Signed-off-by: Taejin Park <[email protected]>

* Update enhanced speaker counting for short samples

Signed-off-by: Taejin Park <[email protected]>

* Update and doc string change

Signed-off-by: Taejin Park <[email protected]>

* Reflected PR review comments

Signed-off-by: Taejin Park <[email protected]>

* Ran style fix again to fix it

Signed-off-by: Taejin Park <[email protected]>

* Ran style fix again to fix it

Signed-off-by: Taejin Park <[email protected]>

* Ran style fix again to fix it

Signed-off-by: Taejin Park <[email protected]>

* Ran style fix again to fix it

Signed-off-by: Taejin Park <[email protected]>

* Ran style fix again to fix it

Signed-off-by: Taejin Park <[email protected]>

* Ran style fix again to fix it

Signed-off-by: Taejin Park <[email protected]>

* Ran style fix again to fix it

Signed-off-by: Taejin Park <[email protected]>

* Ran style fix again to fix it

Signed-off-by: Taejin Park <[email protected]>

Co-authored-by: Nithin Rao <[email protected]>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment