Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] fix runtime context graph #1936

Merged
merged 3 commits into from
Aug 6, 2023
Merged

Conversation

kaixunhuang0
Copy link
Collaborator

修复runtime context graph在使用英文热词时会给每个bpe字词添加“▁”的问题,并且为英文热词设置热词分数时不再乘以token长度。

修复解码过程中热词匹配到一半时失配需要消耗一个token的问题。该问题导致对于“唯品唯品会”无法匹配热词“唯品会”,对于大热词列表以及识别英语情况下影响会比较大,因为这两种情况中更容易进入热词匹配的过程,也就会产生更多的失配,影响正常的热词增强过程。

Librispeech test-other测试集,使用attention rescoring解码应用热词图前后结果,由于runtime context graph方案中是按照词表直接对热词进行bpe分词,所以效果会比python context graph差一些:

method WER U-WER B-WER
baseline 8.77 5.58 36.84
python context graph 7.9 5.61 28.02
old runtime context graph 8.42 5.71 32.19
new runtime context graph 8.14 5.63 30.19

热词列表大小3838,合并了test-other每条数据包含的热词,context score=2.0
热词列表路径:https://github.com/facebookresearch/fbai-speech/tree/main/is21_deep_bias

@kaixunhuang0 kaixunhuang0 marked this pull request as ready for review August 3, 2023 16:27
continue;
}
// Add '▁' at the beginning of English word.
if (IsAlpha(word)) {
if (IsAlpha(word) && beginning == true) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsAlpha(word) && beginning.

for boolean variables, we can directly use the variable.

continue;
}

// Matching using '▁' separately for English
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we combine the following logic

      // Matching using '▁' separately for English
      if (end == start + 1 && word[0] == kSpaceSymbol[0]) {
        words->emplace_back(string(kSpaceSymbol));
        beginning = false;
        break;
      }

      if (end == start + 1) {
        ++start;
        no_oov = false;
        LOG(WARNING) << word << " is oov.";
      }

@pengzhendong pengzhendong merged commit 9c2f774 into wenet-e2e:main Aug 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants