Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
修复runtime context graph在使用英文热词时会给每个bpe字词添加“▁”的问题,并且为英文热词设置热词分数时不再乘以token长度。
修复解码过程中热词匹配到一半时失配需要消耗一个token的问题。该问题导致对于“唯品唯品会”无法匹配热词“唯品会”,对于大热词列表以及识别英语情况下影响会比较大,因为这两种情况中更容易进入热词匹配的过程,也就会产生更多的失配,影响正常的热词增强过程。
Librispeech test-other测试集,使用attention rescoring解码应用热词图前后结果,由于runtime context graph方案中是按照词表直接对热词进行bpe分词,所以效果会比python context graph差一些:
热词列表大小3838,合并了test-other每条数据包含的热词,context score=2.0
热词列表路径:https://github.com/facebookresearch/fbai-speech/tree/main/is21_deep_bias