-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TTS/TN/G2P] Remove Text Processing from NeMo, move G2P to TTS #5982
Conversation
Signed-off-by: ekmb <[email protected]>
for more information, see https://pre-commit.ci
Signed-off-by: ekmb <[email protected]>
Signed-off-by: ekmb <[email protected]>
for more information, see https://pre-commit.ci
Signed-off-by: ekmb <[email protected]>
Signed-off-by: ekmb <[email protected]>
Signed-off-by: ekmb <[email protected]>
@@ -14,4 +14,4 @@ | |||
|
|||
# TODO @xueyang: deprecate this file since no other places import modules from here anymore. However, | |||
# all checkpoints uploaded in ngc used this path. So it requires to update all ngc checkpoints g2p path as well. | |||
from nemo_text_processing.g2p.modules import IPAG2P, BaseG2p, EnglishG2p | |||
from nemo.collections.tts.g2p.modules import IPAG2P, BaseG2p, EnglishG2p |
Check notice
Code scanning / CodeQL
Unused import
Signed-off-by: ekmb <[email protected]>
Signed-off-by: ekmb <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if all directories are removed as expected. I am still seeing below directories in my local copy although they are empty now. Maybe there are cached files in my local. But please double check it empty directories are removed.
nemo_text_processing
tests/nemo_text_processing/g2p
removed. I might be easier to browse files in the branch https://github.com/NVIDIA/NeMo/tree/g2p_to_tts |
scripts/dataset_processing/tts/sfbilingual/ds_conf/ds_for_fastpitch_align.yaml
Outdated
Show resolved
Hide resolved
Signed-off-by: ekmb <[email protected]>
Signed-off-by: ekmb <[email protected]>
Signed-off-by: ekmb <[email protected]>
Signed-off-by: ekmb <[email protected]>
for more information, see https://pre-commit.ci
Signed-off-by: ekmb <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you!
@@ -96,7 +96,6 @@ Text normalization (TN) converts text from written form into its verbalized form | |||
_target_: nemo_text_processing.text_normalization.normalize.Normalizer | |||
lang: en | |||
input_case: cased | |||
whitelist: "nemo_text_processing/text_normalization/en/data/whitelist/lj_speech.tsv" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline: please add a paragraph in the documentation describing how default/non-default settings for whitelist
work in a future PR.
nemo/collections/common/tokenizers/text_to_speech/tts_tokenizers.py
Outdated
Show resolved
Hide resolved
try: | ||
import nemo_text_processing | ||
|
||
self.normalizer = instantiate(cfg.text_normalizer, **normalizer_kwargs) | ||
self.text_normalizer_call = self.normalizer.normalize | ||
except Exception as e: | ||
logging.error(e) | ||
raise ImportError( | ||
"`nemo_text_processing` not installed, see https://github.com/NVIDIA/NeMo-text-processing for more details" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use the same import guard like in other areas?
try:
import nemo_text_processing
NEMO_TEXT_INSTALLED = True
except ModuleNotFoundError:
NEMO_TEXT_INSTALLED = False
if not NEMO_TEXT_INSTALLED:
raise()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use the same import guard like in other areas?
try: import nemo_text_processing NEMO_TEXT_INSTALLED = True except ModuleNotFoundError: NEMO_TEXT_INSTALLED = False if not NEMO_TEXT_INSTALLED: raise()
I like this suggestion! And how about putting this to base.py so that every class can just import NEMO_TEXT_INSTALLED from base.py?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code style is good, but are we really wanna unify it in this PR? This is a known issue that is not only applied for nemo_text_processing
. We have other lib import needs to be updated as well. I wonder if we could file a separate PR to fix the import guard altogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do a separate PR for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, we can leave it for a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added it to JIRA backlog. Please feel free to pick it up.
Signed-off-by: ekmb <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating!
- As discussed offline: please add a paragraph in the documentation describing how default/non-default settings for whitelist work in a future PR.
- rebase to the lastest main to avoid conflicts.
@@ -731,11 +731,11 @@ def encode(self, text): | |||
|
|||
def encode_from_g2p(self, g2p_text: List[str], raw_text: Optional[str] = None): | |||
""" | |||
Encodes text that has already been run through G2P_paper. | |||
Called for encoding to tokens after text preprocessing and G2P_paper. | |||
Encodes text that has already been run through G2Pr. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/G2Pr/G2P/
try: | ||
import nemo_text_processing | ||
|
||
self.normalizer = instantiate(cfg.text_normalizer, **normalizer_kwargs) | ||
self.text_normalizer_call = self.normalizer.normalize | ||
except Exception as e: | ||
logging.error(e) | ||
raise ImportError( | ||
"`nemo_text_processing` not installed, see https://github.com/NVIDIA/NeMo-text-processing for more details" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added it to JIRA backlog. Please feel free to pick it up.
I observed that the TN related item in
|
…A#5982) * remove TN Signed-off-by: ekmb <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix imports Signed-off-by: ekmb <[email protected]> * fix import Signed-off-by: ekmb <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add missing init Signed-off-by: ekmb <[email protected]> * fix import Signed-off-by: ekmb <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename unit test Signed-off-by: ekmb <[email protected]> * fix import Signed-off-by: ekmb <[email protected]> * fix modules test Signed-off-by: ekmb <[email protected]> * fix imports Signed-off-by: ekmb <[email protected]> * remove whitelist from config Signed-off-by: ekmb <[email protected]> * delete wordid file Signed-off-by: ekmb <[email protected]> * remove pynini_install from tutorials Signed-off-by: ekmb <[email protected]> * update requirements Signed-off-by: ekmb <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support warning Signed-off-by: ekmb <[email protected]> * review Signed-off-by: ekmb <[email protected]> --------- Signed-off-by: ekmb <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
What does this PR do ?
Add a one line overview of what this PR aims to accomplish.
Collection: TTS/TN
Changelog
Usage
# Add a code snippet demonstrating how to use this
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information