[ASR] GSS-based mask estimator #7849

anteju · 2023-11-02T06:32:53Z

What does this PR do ?

Adding GSS mask estimator for use in multispeaker scenarios.

Collection: ASR

Changelog

Added an implementation of mask estimator using directional statistics clustering and activity guidance
Added a unit test to test the module can be initialized and the output has the expected shape

Usage

import torch
from nemo.collections.asr.modules.audio_modules import MaskEstimatorGSS

batch_size, num_channels, num_subbands, num_frames = 1, 4, 257, 100
num_outputs = 3

# input (mixture) spectrogram
spec = torch.randn(batch_size, num_channels, num_subbands, num_frames, dtype=torch.cfloat)
# estimated source activity
source_activity = torch.randn(batch_size, num_outputs, num_frames) > 0.5
# mask estimator
me = MaskEstimatorGSS(num_iterations=5)
# estimated source masks
masks = me(input=spec, activity=source_activity)

assert masks.shape == (batch_size, num_outputs, num_subbands, num_frames)

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

nemo/collections/asr/modules/audio_modules.py

anteju · 2023-11-02T23:24:21Z

jenkins

anteju · 2023-11-08T01:35:20Z

jenkins

anteju · 2023-11-14T02:08:37Z

jenkins

tango4j

I ran the example code in the PR description without a problem,
I ran tests/collections/asr/test_audio_modules.py without a problem for both cpu and gpu.
I think the test runs can be faster if we use pytest.mark.parameterize() for TestMaskEstimator. This is the only major comment I would like to make.

tests/collections/asr/test_audio_modules.py

nemo/collections/asr/modules/audio_modules.py

tests/collections/asr/test_audio_modules.py

Signed-off-by: Ante Jukić <[email protected]>

tango4j

Seems like the comments are well addressed. The new test is passing without a problem.
Approving.

tango4j · 2023-11-15T01:26:41Z

ImportError while loading conftest '/home/taejinp/projects/gss_mask/NeMo/tests/conftest.py'. tests/conftest.py:28: in <module> from tests.fixtures.tts import * tests/fixtures/tts.py:21: in <module> from nemo.collections.asr.parts.utils.manifest_utils import read_manifest nemo/collections/asr/__init__.py:15: in <module> from nemo.collections.asr import data, losses, models, modules nemo/collections/asr/models/__init__.py:17: in <module> from nemo.collections.asr.models.classification_models import EncDecClassificationModel, EncDecFrameClassificationModel nemo/collections/asr/models/classification_models.py:29: in <module> from nemo.collections.asr.data import audio_to_label_dataset, feature_to_label_dataset nemo/collections/asr/data/audio_to_label_dataset.py:19: in <module> from nemo.collections.asr.data.audio_to_text_dataset import convert_to_config_list, get_chain_dataset nemo/collections/asr/data/audio_to_text_dataset.py:28: in <module> from nemo.collections.asr.data.huggingface.hf_audio_to_text_dataset import ( nemo/collections/asr/data/huggingface/hf_audio_to_text_dataset.py:17: in <module> from nemo.collections.asr.data.huggingface.hf_audio_to_text import ( nemo/collections/asr/data/huggingface/hf_audio_to_text.py:34: in <module> class HFTextProcessor: nemo/collections/asr/data/huggingface/hf_audio_to_text.py:55: in HFTextProcessor symbols_to_keep: Optional[str | List[str]] = None, E TypeError: unsupported operand type(s) for |: 'type' and '_GenericAlias'

This error keeps appearing regardless of commits so commented when I was running the tests.
This might take some time to pass the test; Note that this error is nothing to do with this (#7849) PR.

anteju · 2023-11-15T22:01:44Z

jenkins

* Added GSS-based mask estimator for multispeaker scenarios Signed-off-by: Ante Jukić <[email protected]> * Addressed PR comments Signed-off-by: Ante Jukić <[email protected]> --------- Signed-off-by: Ante Jukić <[email protected]> Co-authored-by: Taejin Park <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]>

* Added GSS-based mask estimator for multispeaker scenarios Signed-off-by: Ante Jukić <[email protected]> * Addressed PR comments Signed-off-by: Ante Jukić <[email protected]> --------- Signed-off-by: Ante Jukić <[email protected]> Co-authored-by: Taejin Park <[email protected]>

anteju requested a review from tango4j November 2, 2023 06:33

github-actions bot added the ASR label Nov 2, 2023

github-advanced-security bot found potential problems Nov 2, 2023

View reviewed changes

nemo/collections/asr/modules/audio_modules.py Fixed Show fixed Hide fixed

anteju force-pushed the pr/gss-mask-estimator branch from 3f3d6e6 to 0544d58 Compare November 2, 2023 23:24

anteju marked this pull request as ready for review November 2, 2023 23:24

anteju force-pushed the pr/gss-mask-estimator branch from 0544d58 to 848f1fd Compare November 8, 2023 01:35

anteju force-pushed the pr/gss-mask-estimator branch from 848f1fd to 83e3434 Compare November 9, 2023 23:39

tango4j requested changes Nov 14, 2023

View reviewed changes

Added GSS-based mask estimator for multispeaker scenarios

a00a409

Signed-off-by: Ante Jukić <[email protected]>

anteju force-pushed the pr/gss-mask-estimator branch from b1f803e to a00a409 Compare November 14, 2023 23:14

Addressed PR comments

3c98856

Signed-off-by: Ante Jukić <[email protected]>

tango4j approved these changes Nov 15, 2023

View reviewed changes

Merge branch 'main' into pr/gss-mask-estimator

e46e29d

anteju merged commit 0d3d8fa into NVIDIA:main Nov 16, 2023
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ASR] GSS-based mask estimator #7849

[ASR] GSS-based mask estimator #7849

anteju commented Nov 2, 2023 •

edited

Loading

anteju commented Nov 2, 2023

anteju commented Nov 8, 2023

anteju commented Nov 14, 2023

tango4j left a comment

tango4j left a comment

tango4j commented Nov 15, 2023 •

edited

Loading

anteju commented Nov 15, 2023

[ASR] GSS-based mask estimator #7849

[ASR] GSS-based mask estimator #7849

Conversation

anteju commented Nov 2, 2023 • edited Loading

What does this PR do ?

Changelog

Usage

Before your PR is "Ready for review"

Who can review?

Additional Information

anteju commented Nov 2, 2023

anteju commented Nov 8, 2023

anteju commented Nov 14, 2023

tango4j left a comment

Choose a reason for hiding this comment

tango4j left a comment

Choose a reason for hiding this comment

tango4j commented Nov 15, 2023 • edited Loading

anteju commented Nov 15, 2023

anteju commented Nov 2, 2023 •

edited

Loading

tango4j commented Nov 15, 2023 •

edited

Loading