-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Making gender assignment random for cardinals, fractions, and decimal… #3759
Conversation
…s. Also making fixes to measure so sounds more fluent. Signed-off-by: Bonham79 <[email protected]>
Signed-off-by: Bonham79 <[email protected]>
This pull request introduces 1 alert when merging 245adb8 into bc6215f - view on LGTM.com new alerts:
|
Signed-off-by: bonham79 <[email protected]>
Signed-off-by: Bonham79 <[email protected]>
@bonham79 thanks alot for this! |
Hi Travis @bonham79, thank you very much for this PR. I think there is a bug somewhere causing only masculine cardinals and decimals to be produced? |
@erastorgueva-nv Was this for just through |
@bonham79 Hi Travis, when I wrote that comment I had only seen it in |
Ah! I will have time to check tomorrow to see what's up. |
@yzhang123 @erastorgueva-nv I ran Looking at the rewrite function outputs, the only difference I'm seeing is that is switches list order. e.g.
For Also for decimals you'll see preference for masculines on
Thoughts? |
…or multiples of hundred thousand. Signed-off-by: Bonham79 <[email protected]>
This pull request introduces 1 alert when merging ce9c30f into ad2a730 - view on LGTM.com new alerts:
|
Signed-off-by: Bonham79 <[email protected]>
Hey @bonham79 I'm not sure if I understand correctly since i do not know spanish. But if there is only one correct gender, please use that! don't abstract away from that and make mistakes in the grammar. I do not know if you know German @bonham79 , but for ordinals the suffix needs to match the gender of the following noun. E.g. "third[suffix] woman": suffix need to match the female noun following it. Since we don't know the context when we write the grammar I just randomly choose a [suffix]. so it could be that in the end it chooses "der dritte Frau", instead of "die dritte Frau". Does that make sense? In case you know the context, please use the correct gender! |
@yzhang123 Sorry the examples were for @erastorgueva-nv so didn't fully explain. This is for the case of when it needs to randomly assign gender just like the case you gave for German. It appears that For cases where the gender can be assumed (time, money) I didn't make any changes - save for a small bug I found in the measure tagger last night. |
Hi @bonham79 as long as the weights are the same, its fine. If you get mismatches from pytest and sparrowhawk due to different paths they pick, just adjust/or disable that test. We assume if we have same weights the output is not well defined. If you still want to test that all genders are considered equally, you can do so by add all gender options here https://github.com/NVIDIA/NeMo/blob/main/tests/nemo_text_processing/en/test_normalization_with_audio.py#L29 for english |
Hi all, it seems to me that currently the TN code does not meet our requirement of the gender being random when we do normalization. For Specific results that I get are listed below. Note that I always get the same results each time I run
|
@erastorgueva-nv It was my understanding that by 'random' we were referring to making all valid outputs have equivalent weights and letting |
if you can do truly random easily that is was we want. But it seems like a platform implementation detail rather than what we can define. is it possible in Sparrowhawk? Is the truly randomization a feature build in FST graph? if not, sparrowhawk won't be able to do true randomization |
It wouldn't be random in the FST since (I believe) Sparrowhawk's shortest path algorithm is deterministic. What I can do is simply randomize per FST instantiation by a random number generator. So each time the FST is built there would be a chance decision to go one gender or another. |
that's overkill, let's not do that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Thank you very much for your hard work Travis @bonham79.
I spoke to Yang yesterday about what are our requirements for this PR and realised that the code meets the requirements. The important thing is that pynini weighs the different options the same (as you have said it does). The fact that nemo_text_processing
prefers masculines and the fact that sparrowhawk
prefers feminines is a quirk of implementation (as you say).
#3759) * Making gender assignment random for cardinals, fractions, and decimals. Also making fixes to measure so sounds more fluent. Signed-off-by: Bonham79 <[email protected]> * Style fixes Signed-off-by: Bonham79 <[email protected]> * Import fix. Signed-off-by: bonham79 <[email protected]> * Files missed style check after import fix Signed-off-by: Bonham79 <[email protected]> * Fixing bug in measure .py which was preventing gender carrying over for multiples of hundred thousand. Signed-off-by: Bonham79 <[email protected]> * Missed an import Signed-off-by: Bonham79 <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> Co-authored-by: Yang Zhang <[email protected]>
#3759) * Making gender assignment random for cardinals, fractions, and decimals. Also making fixes to measure so sounds more fluent. Signed-off-by: Bonham79 <[email protected]> * Style fixes Signed-off-by: Bonham79 <[email protected]> * Import fix. Signed-off-by: bonham79 <[email protected]> * Files missed style check after import fix Signed-off-by: Bonham79 <[email protected]> * Fixing bug in measure .py which was preventing gender carrying over for multiples of hundred thousand. Signed-off-by: Bonham79 <[email protected]> * Missed an import Signed-off-by: Bonham79 <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> Co-authored-by: Yang Zhang <[email protected]>
…s. Also making fixes to measure so sounds more fluent.
Signed-off-by: Bonham79 [email protected]
What does this PR do ?
Updates tn_es to make gender randomized.
Add a one line overview of what this PR aims to accomplish.
This will update some minor aspects of tn_es to lower error rate.
Collection: [Nemo Text Processing]
Changelog
Usage
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information