Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TTS/TN/G2P] Remove Text Processing from NeMo, move G2P to TTS #5982

Merged
merged 26 commits into from
Feb 15, 2023
Merged

Conversation

ekmb
Copy link
Collaborator

@ekmb ekmb commented Feb 10, 2023

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Collection: TTS/TN

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

@ekmb ekmb marked this pull request as draft February 10, 2023 21:14
@ekmb ekmb marked this pull request as ready for review February 10, 2023 21:45
@@ -14,4 +14,4 @@

# TODO @xueyang: deprecate this file since no other places import modules from here anymore. However,
# all checkpoints uploaded in ngc used this path. So it requires to update all ngc checkpoints g2p path as well.
from nemo_text_processing.g2p.modules import IPAG2P, BaseG2p, EnglishG2p
from nemo.collections.tts.g2p.modules import IPAG2P, BaseG2p, EnglishG2p

Check notice

Code scanning / CodeQL

Unused import

Import of 'IPAG2P' is not used. Import of 'BaseG2p' is not used. Import of 'EnglishG2p' is not used.
Signed-off-by: ekmb <[email protected]>
Signed-off-by: ekmb <[email protected]>
Copy link
Collaborator

@XuesongYang XuesongYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if all directories are removed as expected. I am still seeing below directories in my local copy although they are empty now. Maybe there are cached files in my local. But please double check it empty directories are removed.

  • nemo_text_processing
  • tests/nemo_text_processing/g2p

@ekmb
Copy link
Collaborator Author

ekmb commented Feb 13, 2023

Not sure if all directories are removed as expected. I am still seeing below directories in my local copy although they are empty now. Maybe there are cached files in my local. But please double check it empty directories are removed.

  • nemo_text_processing
  • tests/nemo_text_processing/g2p

removed. I might be easier to browse files in the branch https://github.com/NVIDIA/NeMo/tree/g2p_to_tts

docs/source/tts/configs.rst Outdated Show resolved Hide resolved
docs/source/tts/datasets.rst Outdated Show resolved Hide resolved
examples/tts/conf/aligner.yaml Outdated Show resolved Hide resolved
examples/tts/conf/fastpitch_align_44100.yaml Outdated Show resolved Hide resolved
examples/tts/conf/fastpitch_align_ipa.yaml Outdated Show resolved Hide resolved
examples/tts/conf/rad-tts_feature_pred.yaml Outdated Show resolved Hide resolved
examples/tts/conf/rad-tts_feature_pred_ipa.yaml Outdated Show resolved Hide resolved
examples/tts/conf/tacotron2.yaml Outdated Show resolved Hide resolved
examples/tts/conf/vits.yaml Outdated Show resolved Hide resolved
examples/tts/conf/vits_44100.yaml Outdated Show resolved Hide resolved
@XuesongYang XuesongYang self-requested a review February 15, 2023 03:08
@XuesongYang XuesongYang dismissed their stale review February 15, 2023 03:09

update based on offline discussion

@github-actions github-actions bot added the NLP label Feb 15, 2023
XuesongYang
XuesongYang previously approved these changes Feb 15, 2023
Copy link
Collaborator

@XuesongYang XuesongYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you!

redoctopus
redoctopus previously approved these changes Feb 15, 2023
@@ -96,7 +96,6 @@ Text normalization (TN) converts text from written form into its verbalized form
_target_: nemo_text_processing.text_normalization.normalize.Normalizer
lang: en
input_case: cased
whitelist: "nemo_text_processing/text_normalization/en/data/whitelist/lj_speech.tsv"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline: please add a paragraph in the documentation describing how default/non-default settings for whitelist work in a future PR.

Comment on lines +100 to +109
try:
import nemo_text_processing

self.normalizer = instantiate(cfg.text_normalizer, **normalizer_kwargs)
self.text_normalizer_call = self.normalizer.normalize
except Exception as e:
logging.error(e)
raise ImportError(
"`nemo_text_processing` not installed, see https://github.com/NVIDIA/NeMo-text-processing for more details"
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use the same import guard like in other areas?

try:
    import nemo_text_processing
    NEMO_TEXT_INSTALLED = True
except ModuleNotFoundError:
    NEMO_TEXT_INSTALLED = False

if not NEMO_TEXT_INSTALLED:
  raise()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use the same import guard like in other areas?

try:
    import nemo_text_processing
    NEMO_TEXT_INSTALLED = True
except ModuleNotFoundError:
    NEMO_TEXT_INSTALLED = False

if not NEMO_TEXT_INSTALLED:
  raise()

I like this suggestion! And how about putting this to base.py so that every class can just import NEMO_TEXT_INSTALLED from base.py?

Copy link
Collaborator

@XuesongYang XuesongYang Feb 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code style is good, but are we really wanna unify it in this PR? This is a known issue that is not only applied for nemo_text_processing. We have other lib import needs to be updated as well. I wonder if we could file a separate PR to fix the import guard altogether.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do a separate PR for that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can leave it for a separate PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added it to JIRA backlog. Please feel free to pick it up.

Dockerfile Show resolved Hide resolved
Dockerfile Show resolved Hide resolved
Signed-off-by: ekmb <[email protected]>
@ekmb ekmb dismissed stale reviews from redoctopus and XuesongYang via e304858 February 15, 2023 19:20
Copy link
Collaborator

@rlangman rlangman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@XuesongYang XuesongYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating!

  • As discussed offline: please add a paragraph in the documentation describing how default/non-default settings for whitelist work in a future PR.
  • rebase to the lastest main to avoid conflicts.

@@ -731,11 +731,11 @@ def encode(self, text):

def encode_from_g2p(self, g2p_text: List[str], raw_text: Optional[str] = None):
"""
Encodes text that has already been run through G2P_paper.
Called for encoding to tokens after text preprocessing and G2P_paper.
Encodes text that has already been run through G2Pr.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/G2Pr/G2P/

Comment on lines +100 to +109
try:
import nemo_text_processing

self.normalizer = instantiate(cfg.text_normalizer, **normalizer_kwargs)
self.text_normalizer_call = self.normalizer.normalize
except Exception as e:
logging.error(e)
raise ImportError(
"`nemo_text_processing` not installed, see https://github.com/NVIDIA/NeMo-text-processing for more details"
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added it to JIRA backlog. Please feel free to pick it up.

@XuesongYang
Copy link
Collaborator

XuesongYang commented Feb 15, 2023

I observed that the TN related item in Tool section is not removed.

[Tools](https://github.com/NVIDIA/NeMo/tree/stable/tools)
[Text Processing (text normalization and inverse text normalization)](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/text_normalization/intro.html)

@ekmb ekmb merged commit 5b71d4d into main Feb 15, 2023
@ekmb ekmb deleted the g2p_to_tts branch February 15, 2023 22:23
@ekmb ekmb restored the g2p_to_tts branch February 16, 2023 20:54
@XuesongYang XuesongYang deleted the g2p_to_tts branch February 21, 2023 08:33
titu1994 pushed a commit to titu1994/NeMo that referenced this pull request Mar 24, 2023
…A#5982)

* remove TN

Signed-off-by: ekmb <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix imports

Signed-off-by: ekmb <[email protected]>

* fix import

Signed-off-by: ekmb <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing init

Signed-off-by: ekmb <[email protected]>

* fix import

Signed-off-by: ekmb <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rename unit test

Signed-off-by: ekmb <[email protected]>

* fix import

Signed-off-by: ekmb <[email protected]>

* fix modules test

Signed-off-by: ekmb <[email protected]>

* fix imports

Signed-off-by: ekmb <[email protected]>

* remove whitelist from config

Signed-off-by: ekmb <[email protected]>

* delete wordid file

Signed-off-by: ekmb <[email protected]>

* remove pynini_install from tutorials

Signed-off-by: ekmb <[email protected]>

* update requirements

Signed-off-by: ekmb <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support warning

Signed-off-by: ekmb <[email protected]>

* review

Signed-off-by: ekmb <[email protected]>

---------

Signed-off-by: ekmb <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants