Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add g2pW to Chinese frontend #2230

Merged
merged 11 commits into from
Aug 15, 2022
Merged

Conversation

BarryKCL
Copy link
Contributor

@BarryKCL BarryKCL commented Aug 7, 2022

PR types

Performance optimization

PR changes

APIs

Describe

add g2pW onnxruntime

@CLAassistant
Copy link

CLAassistant commented Aug 7, 2022

CLA assistant check
All committers have signed the CLA.

initials.append(sub_initials)
finals.append(sub_finals)
# assert len(sub_initials) == len(sub_finals) == len(word)
if self.g2p_model == "g2pW":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为保持一致,if self.g2p_model == "g2pW" 的判断和执行逻辑,是否也放到 _get_initials_finals,此处保持不变?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_get_initials_finals是基于分词后的word去转拼音,对多音词效果不佳。因此我在使用g2pW改成了整句预测拼音,再映射回分词后的word。如果要将if self.g2p_model == "g2pW"放到_get_initials_finals里面,那么分词前的句子也要传到_get_initials_finals里面。两个地方至少要改一个,现在这样其实也差不多?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的,明白这里细节的区别了,那这里可以先保持这样,但是你可以在代码里面加一下注释,说明是为了多音字更好 g2pw 用了分词前的句子作为输入
但是不知道如果 pypinyin 直接输入分词前的句子多音字效果是否会变好?我猜还是不行?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pypinyin直接输入句子的效果我在你之前发的badcase展示过,那三个对比的结果都是整句预测的。

@@ -53,9 +56,24 @@ def insert_after_character(lst, item):
return result


class Polyphonic():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果这个类是 G2PW 独用的,是否重命名为 G2PWPolyphonic(),如果也可以用在 G2PM 上,也可以加到 G2PM 的逻辑里(现在 pypinyin 的写法应该是暂时加不上这个修正逻辑)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个是我根据G2PW预测的一些badcase添加的,理论上也可以用在G2PM上。但是G2PM的badcase与G2PW的可能不需要共用一个polyphonic.yaml,可以根据自己的实际情况决定吧。

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok,这块等这个 pr 合入后我们考虑修改

import sys


class RunningAverage:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个类如果没有用到可以删掉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


from paddlenlp.transformers import BertTokenizer

from paddlespeech.t2s.frontend.g2pw.dataset import prepare_data, prepare_onnx_input, get_phoneme_labels, get_char_phoneme_labels
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

此处分多行 import,最好用 pre-commit 刷一下代码格式

@@ -0,0 +1,161 @@
import re
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果参考了 g2pw 的代码,请添加 copyright 或者 "
This code is copied/modified from {g2pw} 的链接 "

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
return outputs

def _truncate_texts(window_size, texts, query_ids):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个文件的函数可以增加一点注释,会更易懂一些~~~

@GitYCC
Copy link

GitYCC commented Aug 10, 2022

@yt605155624
Copy link
Collaborator

yt605155624 commented Aug 11, 2022

original wer of g2p:
The avg WER of g2p is: 0.026014352515701198

     ,--------------------------------------------------------------------.
     |        | # Snt    # Wrd  | Corr    Sub    Del    Ins    Err  S.Err |
     |--------+-----------------+-----------------------------------------|
     | Sum/Avg|  9996   299181  | 97.3    2.7    0.0    0.0    2.7   52.2 |
     `--------------------------------------------------------------------'

wer after add g2pw:
The avg WER of g2p is: 0.028952373312476395

     ,--------------------------------------------------------------------.
     |                         ./exp/g2p/text.g2p                         |
     |--------------------------------------------------------------------|
     | SPKR   | # Snt    # Wrd  | Corr    Sub    Del    Ins    Err  S.Err |
     |--------+-----------------+-----------------------------------------|
     | Sum/Avg|  9996   299181  | 97.2    2.8    0.0    0.1    2.9   53.3 |
     `--------------------------------------------------------------------'

@yt605155624
Copy link
Collaborator

need to solve this issue: GitYCC/g2pW#9

@yt605155624 yt605155624 merged commit a75b2a5 into PaddlePaddle:develop Aug 15, 2022
This was referenced Aug 16, 2022
@sixyang
Copy link

sixyang commented Sep 1, 2022

original wer of g2p: The avg WER of g2p is: 0.026014352515701198

     ,--------------------------------------------------------------------.
     |        | # Snt    # Wrd  | Corr    Sub    Del    Ins    Err  S.Err |
     |--------+-----------------+-----------------------------------------|
     | Sum/Avg|  9996   299181  | 97.3    2.7    0.0    0.0    2.7   52.2 |
     `--------------------------------------------------------------------'

wer after add g2pw: The avg WER of g2p is: 0.028952373312476395

     ,--------------------------------------------------------------------.
     |                         ./exp/g2p/text.g2p                         |
     |--------------------------------------------------------------------|
     | SPKR   | # Snt    # Wrd  | Corr    Sub    Del    Ins    Err  S.Err |
     |--------+-----------------+-----------------------------------------|
     | Sum/Avg|  9996   299181  | 97.2    2.8    0.0    0.1    2.9   53.3 |
     `--------------------------------------------------------------------'

这个表示 g2pw 相比 g2p 并没有多少提升吗?

@yt605155624
Copy link
Collaborator

original wer of g2p: The avg WER of g2p is: 0.026014352515701198

     ,--------------------------------------------------------------------.
     |        | # Snt    # Wrd  | Corr    Sub    Del    Ins    Err  S.Err |
     |--------+-----------------+-----------------------------------------|
     | Sum/Avg|  9996   299181  | 97.3    2.7    0.0    0.0    2.7   52.2 |
     `--------------------------------------------------------------------'

wer after add g2pw: The avg WER of g2p is: 0.028952373312476395

     ,--------------------------------------------------------------------.
     |                         ./exp/g2p/text.g2p                         |
     |--------------------------------------------------------------------|
     | SPKR   | # Snt    # Wrd  | Corr    Sub    Del    Ins    Err  S.Err |
     |--------+-----------------+-----------------------------------------|
     | Sum/Avg|  9996   299181  | 97.2    2.8    0.0    0.1    2.9   53.3 |
     `--------------------------------------------------------------------'

这个表示 g2pw 相比 g2p 并没有多少提升吗?

可以去 example/other/g2p 查看最新的数据,目前是 0.024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

6 participants