Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Chinese text normalization]Chinese TN part in text_normalization #4826

Merged
merged 11 commits into from
Aug 29, 2022

Conversation

mzxcpp
Copy link
Contributor

@mzxcpp mzxcpp commented Aug 27, 2022

This PR continues #4543 & #4638 & #4683
What does this PR do ?
Add Chinese Text Normalization Tools in NeMo

Collection:
[NeMo/norm_text_processing/text_normalization]
[NeMo/tools/text_processing_deployment]
[NeMo/tests/nemo_text_processing/zh]

@mzxcpp
Copy link
Contributor Author

mzxcpp commented Aug 27, 2022

@yzhang123 I make a new pr as a clear version.

Comment on lines 31 to 39
score = (
pynutil.insert("score: \"")
+ Cardinal().graph_cardinal
+ pynini.cross(":", "比")
+ Cardinal().graph_cardinal
+ pynutil.insert("\"")
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the symbols from tsv file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have done

add your sign in data/math/symbol.tsv,this graph just convert sigh to character,you can add more
cases with detailed cases
'''
score_sign = pynini.string_file(get_abs_path("data/math/score.tsv"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_abs_path not defined here and and in other files

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for this,which happens with last change trying to add tsv in math.I checked other files and they are correct now.

mzxcpp and others added 10 commits August 29, 2022 17:22
Signed-off-by: Ubuntu <[email protected]>
Signed-off-by: Ubuntu <[email protected]>
Signed-off-by: Ubuntu <[email protected]>
Signed-off-by: Ubuntu <[email protected]>
Signed-off-by: Ubuntu <[email protected]>
Signed-off-by: Ubuntu <[email protected]>
Signed-off-by: nithinraok <[email protected]>

Signed-off-by: nithinraok <[email protected]>
Signed-off-by: Ubuntu <[email protected]>
Signed-off-by: Ubuntu <[email protected]>
@yzhang123
Copy link
Contributor

yzhang123 commented Aug 29, 2022

@pengzhendong thanks for your review on this and previous PRs. If nothing more from your side I'm ok to merge.
@mzxcpp thanks for your work!!

FYI: all pytest and SH tests passed

@BuyuanCui
Copy link

BuyuanCui commented Aug 29, 2022

Hope this is not too late, we received a documentation last week on how to handle numbers from ASR team in Shanghai. Here is the link to it: http://www.moe.gov.cn/ewebeditor/uploadfile/2015/01/13/20150113091154536.pdf

@yzhang123 yzhang123 merged commit d969162 into NVIDIA:main Aug 29, 2022
jubick1337 pushed a commit to jubick1337/NeMo that referenced this pull request Oct 3, 2022
…IDIA#4826)

* add zh in normalize

Signed-off-by: Ubuntu <[email protected]>

* add "zh" in normalize.py

Signed-off-by: Ubuntu <[email protected]>

* add zh in tools

Signed-off-by: Ubuntu <[email protected]>

* add zh in test

Signed-off-by: Ubuntu <[email protected]>

* fix bug in en/graph_utils.py

Signed-off-by: Ubuntu <[email protected]>

* Update README.md

Signed-off-by: Ubuntu <[email protected]>

* add score.tsv

Signed-off-by: Ubuntu <[email protected]>

* update math

Signed-off-by: Ubuntu <[email protected]>

* add kab language asr models (NVIDIA#4819)

Signed-off-by: nithinraok <[email protected]>

Signed-off-by: nithinraok <[email protected]>
Signed-off-by: Ubuntu <[email protected]>

* add import in math

Signed-off-by: Ubuntu <[email protected]>

Signed-off-by: Ubuntu <[email protected]>
Signed-off-by: nithinraok <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Yang Zhang <[email protected]>
Signed-off-by: Matvei Novikov <[email protected]>
jubick1337 pushed a commit to jubick1337/NeMo that referenced this pull request Oct 4, 2022
…IDIA#4826)

* add zh in normalize

Signed-off-by: Ubuntu <[email protected]>

* add "zh" in normalize.py

Signed-off-by: Ubuntu <[email protected]>

* add zh in tools

Signed-off-by: Ubuntu <[email protected]>

* add zh in test

Signed-off-by: Ubuntu <[email protected]>

* fix bug in en/graph_utils.py

Signed-off-by: Ubuntu <[email protected]>

* Update README.md

Signed-off-by: Ubuntu <[email protected]>

* add score.tsv

Signed-off-by: Ubuntu <[email protected]>

* update math

Signed-off-by: Ubuntu <[email protected]>

* add kab language asr models (NVIDIA#4819)

Signed-off-by: nithinraok <[email protected]>

Signed-off-by: nithinraok <[email protected]>
Signed-off-by: Ubuntu <[email protected]>

* add import in math

Signed-off-by: Ubuntu <[email protected]>

Signed-off-by: Ubuntu <[email protected]>
Signed-off-by: nithinraok <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Yang Zhang <[email protected]>
Signed-off-by: Matvei Novikov <[email protected]>
jubick1337 pushed a commit to jubick1337/NeMo that referenced this pull request Oct 4, 2022
…IDIA#4826)

* add zh in normalize

Signed-off-by: Ubuntu <[email protected]>

* add "zh" in normalize.py

Signed-off-by: Ubuntu <[email protected]>

* add zh in tools

Signed-off-by: Ubuntu <[email protected]>

* add zh in test

Signed-off-by: Ubuntu <[email protected]>

* fix bug in en/graph_utils.py

Signed-off-by: Ubuntu <[email protected]>

* Update README.md

Signed-off-by: Ubuntu <[email protected]>

* add score.tsv

Signed-off-by: Ubuntu <[email protected]>

* update math

Signed-off-by: Ubuntu <[email protected]>

* add kab language asr models (NVIDIA#4819)

Signed-off-by: nithinraok <[email protected]>

Signed-off-by: nithinraok <[email protected]>
Signed-off-by: Ubuntu <[email protected]>

* add import in math

Signed-off-by: Ubuntu <[email protected]>

Signed-off-by: Ubuntu <[email protected]>
Signed-off-by: nithinraok <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Yang Zhang <[email protected]>
Signed-off-by: Matvei Novikov <[email protected]>
jubick1337 pushed a commit to jubick1337/NeMo that referenced this pull request Oct 4, 2022
…IDIA#4826)

* add zh in normalize

Signed-off-by: Ubuntu <[email protected]>

* add "zh" in normalize.py

Signed-off-by: Ubuntu <[email protected]>

* add zh in tools

Signed-off-by: Ubuntu <[email protected]>

* add zh in test

Signed-off-by: Ubuntu <[email protected]>

* fix bug in en/graph_utils.py

Signed-off-by: Ubuntu <[email protected]>

* Update README.md

Signed-off-by: Ubuntu <[email protected]>

* add score.tsv

Signed-off-by: Ubuntu <[email protected]>

* update math

Signed-off-by: Ubuntu <[email protected]>

* add kab language asr models (NVIDIA#4819)

Signed-off-by: nithinraok <[email protected]>

Signed-off-by: nithinraok <[email protected]>
Signed-off-by: Ubuntu <[email protected]>

* add import in math

Signed-off-by: Ubuntu <[email protected]>

Signed-off-by: Ubuntu <[email protected]>
Signed-off-by: nithinraok <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Yang Zhang <[email protected]>
Signed-off-by: Matvei Novikov <[email protected]>
hainan-xv pushed a commit to hainan-xv/NeMo that referenced this pull request Nov 29, 2022
…IDIA#4826)

* add zh in normalize

Signed-off-by: Ubuntu <[email protected]>

* add "zh" in normalize.py

Signed-off-by: Ubuntu <[email protected]>

* add zh in tools

Signed-off-by: Ubuntu <[email protected]>

* add zh in test

Signed-off-by: Ubuntu <[email protected]>

* fix bug in en/graph_utils.py

Signed-off-by: Ubuntu <[email protected]>

* Update README.md

Signed-off-by: Ubuntu <[email protected]>

* add score.tsv

Signed-off-by: Ubuntu <[email protected]>

* update math

Signed-off-by: Ubuntu <[email protected]>

* add kab language asr models (NVIDIA#4819)

Signed-off-by: nithinraok <[email protected]>

Signed-off-by: nithinraok <[email protected]>
Signed-off-by: Ubuntu <[email protected]>

* add import in math

Signed-off-by: Ubuntu <[email protected]>

Signed-off-by: Ubuntu <[email protected]>
Signed-off-by: nithinraok <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Yang Zhang <[email protected]>
Signed-off-by: Hainan Xu <[email protected]>
hainan-xv pushed a commit to hainan-xv/NeMo that referenced this pull request Nov 29, 2022
…IDIA#4826)

* add zh in normalize

Signed-off-by: Ubuntu <[email protected]>

* add "zh" in normalize.py

Signed-off-by: Ubuntu <[email protected]>

* add zh in tools

Signed-off-by: Ubuntu <[email protected]>

* add zh in test

Signed-off-by: Ubuntu <[email protected]>

* fix bug in en/graph_utils.py

Signed-off-by: Ubuntu <[email protected]>

* Update README.md

Signed-off-by: Ubuntu <[email protected]>

* add score.tsv

Signed-off-by: Ubuntu <[email protected]>

* update math

Signed-off-by: Ubuntu <[email protected]>

* add kab language asr models (NVIDIA#4819)

Signed-off-by: nithinraok <[email protected]>

Signed-off-by: nithinraok <[email protected]>
Signed-off-by: Ubuntu <[email protected]>

* add import in math

Signed-off-by: Ubuntu <[email protected]>

Signed-off-by: Ubuntu <[email protected]>
Signed-off-by: nithinraok <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Yang Zhang <[email protected]>
Signed-off-by: Hainan Xu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants