-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add g2pW to Chinese frontend #2230
Conversation
initials.append(sub_initials) | ||
finals.append(sub_finals) | ||
# assert len(sub_initials) == len(sub_finals) == len(word) | ||
if self.g2p_model == "g2pW": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
为保持一致,if self.g2p_model == "g2pW" 的判断和执行逻辑,是否也放到 _get_initials_finals,此处保持不变?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_get_initials_finals是基于分词后的word去转拼音,对多音词效果不佳。因此我在使用g2pW改成了整句预测拼音,再映射回分词后的word。如果要将if self.g2p_model == "g2pW"放到_get_initials_finals里面,那么分词前的句子也要传到_get_initials_finals里面。两个地方至少要改一个,现在这样其实也差不多?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好的,明白这里细节的区别了,那这里可以先保持这样,但是你可以在代码里面加一下注释,说明是为了多音字更好 g2pw 用了分词前的句子作为输入
但是不知道如果 pypinyin 直接输入分词前的句子多音字效果是否会变好?我猜还是不行?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pypinyin直接输入句子的效果我在你之前发的badcase展示过,那三个对比的结果都是整句预测的。
@@ -53,9 +56,24 @@ def insert_after_character(lst, item): | |||
return result | |||
|
|||
|
|||
class Polyphonic(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果这个类是 G2PW 独用的,是否重命名为 G2PWPolyphonic(),如果也可以用在 G2PM 上,也可以加到 G2PM 的逻辑里(现在 pypinyin 的写法应该是暂时加不上这个修正逻辑)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个是我根据G2PW预测的一些badcase添加的,理论上也可以用在G2PM上。但是G2PM的badcase与G2PW的可能不需要共用一个polyphonic.yaml,可以根据自己的实际情况决定吧。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok,这块等这个 pr 合入后我们考虑修改
import sys | ||
|
||
|
||
class RunningAverage: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个类如果没有用到可以删掉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
from paddlenlp.transformers import BertTokenizer | ||
|
||
from paddlespeech.t2s.frontend.g2pw.dataset import prepare_data, prepare_onnx_input, get_phoneme_labels, get_char_phoneme_labels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
此处分多行 import,最好用 pre-commit 刷一下代码格式
@@ -0,0 +1,161 @@ | |||
import re |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果参考了 g2pw 的代码,请添加 copyright 或者 "
This code is copied/modified from {g2pw} 的链接 "
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
return outputs | ||
|
||
def _truncate_texts(window_size, texts, query_ids): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个文件的函数可以增加一点注释,会更易懂一些~~~
@BarryKCL Thanks for your effort! And please reference GitYCC/g2pW in |
original wer of g2p:
wer after add g2pw:
|
need to solve this issue: GitYCC/g2pW#9 |
这个表示 g2pw 相比 g2p 并没有多少提升吗? |
可以去 example/other/g2p 查看最新的数据,目前是 0.024 |
PR types
Performance optimization
PR changes
APIs
Describe
add g2pW onnxruntime