Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ASR] GSS-based mask estimator #7849

Merged
merged 3 commits into from
Nov 16, 2023
Merged

Conversation

anteju
Copy link
Collaborator

@anteju anteju commented Nov 2, 2023

What does this PR do ?

Adding GSS mask estimator for use in multispeaker scenarios.

Collection: ASR

Changelog

  • Added an implementation of mask estimator using directional statistics clustering and activity guidance
  • Added a unit test to test the module can be initialized and the output has the expected shape

Usage

import torch
from nemo.collections.asr.modules.audio_modules import MaskEstimatorGSS

batch_size, num_channels, num_subbands, num_frames = 1, 4, 257, 100
num_outputs = 3

# input (mixture) spectrogram
spec = torch.randn(batch_size, num_channels, num_subbands, num_frames, dtype=torch.cfloat)
# estimated source activity
source_activity = torch.randn(batch_size, num_outputs, num_frames) > 0.5
# mask estimator
me = MaskEstimatorGSS(num_iterations=5)
# estimated source masks
masks = me(input=spec, activity=source_activity)

assert masks.shape == (batch_size, num_outputs, num_subbands, num_frames)

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

@anteju anteju requested a review from tango4j November 2, 2023 06:33
@github-actions github-actions bot added the ASR label Nov 2, 2023
@anteju anteju marked this pull request as ready for review November 2, 2023 23:24
@anteju
Copy link
Collaborator Author

anteju commented Nov 2, 2023

jenkins

@anteju
Copy link
Collaborator Author

anteju commented Nov 8, 2023

jenkins

@anteju
Copy link
Collaborator Author

anteju commented Nov 14, 2023

jenkins

Copy link
Collaborator

@tango4j tango4j left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the example code in the PR description without a problem,
I ran tests/collections/asr/test_audio_modules.py without a problem for both cpu and gpu.
I think the test runs can be faster if we use pytest.mark.parameterize() for TestMaskEstimator. This is the only major comment I would like to make.

tests/collections/asr/test_audio_modules.py Outdated Show resolved Hide resolved
nemo/collections/asr/modules/audio_modules.py Outdated Show resolved Hide resolved
nemo/collections/asr/modules/audio_modules.py Outdated Show resolved Hide resolved
nemo/collections/asr/modules/audio_modules.py Outdated Show resolved Hide resolved
nemo/collections/asr/modules/audio_modules.py Show resolved Hide resolved
nemo/collections/asr/modules/audio_modules.py Outdated Show resolved Hide resolved
tests/collections/asr/test_audio_modules.py Outdated Show resolved Hide resolved
Signed-off-by: Ante Jukić <[email protected]>
Copy link
Collaborator

@tango4j tango4j left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the comments are well addressed. The new test is passing without a problem.
Approving.

@tango4j
Copy link
Collaborator

tango4j commented Nov 15, 2023

ImportError while loading conftest '/home/taejinp/projects/gss_mask/NeMo/tests/conftest.py'. tests/conftest.py:28: in <module> from tests.fixtures.tts import * tests/fixtures/tts.py:21: in <module> from nemo.collections.asr.parts.utils.manifest_utils import read_manifest nemo/collections/asr/__init__.py:15: in <module> from nemo.collections.asr import data, losses, models, modules nemo/collections/asr/models/__init__.py:17: in <module> from nemo.collections.asr.models.classification_models import EncDecClassificationModel, EncDecFrameClassificationModel nemo/collections/asr/models/classification_models.py:29: in <module> from nemo.collections.asr.data import audio_to_label_dataset, feature_to_label_dataset nemo/collections/asr/data/audio_to_label_dataset.py:19: in <module> from nemo.collections.asr.data.audio_to_text_dataset import convert_to_config_list, get_chain_dataset nemo/collections/asr/data/audio_to_text_dataset.py:28: in <module> from nemo.collections.asr.data.huggingface.hf_audio_to_text_dataset import ( nemo/collections/asr/data/huggingface/hf_audio_to_text_dataset.py:17: in <module> from nemo.collections.asr.data.huggingface.hf_audio_to_text import ( nemo/collections/asr/data/huggingface/hf_audio_to_text.py:34: in <module> class HFTextProcessor: nemo/collections/asr/data/huggingface/hf_audio_to_text.py:55: in HFTextProcessor symbols_to_keep: Optional[str | List[str]] = None, E TypeError: unsupported operand type(s) for |: 'type' and '_GenericAlias'

This error keeps appearing regardless of commits so commented when I was running the tests.
This might take some time to pass the test; Note that this error is nothing to do with this (#7849) PR.

@anteju
Copy link
Collaborator Author

anteju commented Nov 15, 2023

jenkins

@anteju anteju merged commit 0d3d8fa into NVIDIA:main Nov 16, 2023
11 checks passed
pzelasko pushed a commit to pzelasko/NeMo that referenced this pull request Jan 3, 2024
* Added GSS-based mask estimator for multispeaker scenarios

Signed-off-by: Ante Jukić <[email protected]>

* Addressed PR comments

Signed-off-by: Ante Jukić <[email protected]>

---------

Signed-off-by: Ante Jukić <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
rohitrango pushed a commit to rohitrango/NeMo that referenced this pull request Jun 25, 2024
* Added GSS-based mask estimator for multispeaker scenarios

Signed-off-by: Ante Jukić <[email protected]>

* Addressed PR comments

Signed-off-by: Ante Jukić <[email protected]>

---------

Signed-off-by: Ante Jukić <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants