Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(whisper): support arbitrary ctc blank id #2157

Merged
merged 1 commit into from
Nov 24, 2023
Merged

fix(whisper): support arbitrary ctc blank id #2157

merged 1 commit into from
Nov 24, 2023

Conversation

xingchensong
Copy link
Member

@xingchensong xingchensong commented Nov 23, 2023

  1. whisper没有单独的blank id(未来要引入的llm,其词表应该也是没有blank id的),因此ctc需要能够指定非0的blank id (whisper的id=0对应的是个感叹号)
  2. (暂时)指定 50362 为 blank_id, 是因为whisper这个对应了一个特殊token <nospeech>,做ASR微调的时候decoder输入永远不会出现这个token(VAD任务才会有),且含义上也比较接近ctc的blank

image

@Mddct
Copy link
Collaborator

Mddct commented Nov 23, 2023

第一点同意

第二点 blank id 能设置为vocab_size+1 吗? blank 和nonspeech还是有点区别

@xingchensong
Copy link
Member Author

xingchensong commented Nov 23, 2023

第一点同意

第二点 blank id 能设置为vocab_size+1 吗? blank 和nonspeech还是有点区别

那ctc的linear要变一下,ctc的词表就和decoder的词表不一样。同意用不同词表的方案可能更好点,后续PR修改

@Mddct Mddct merged commit 8f7a8f3 into main Nov 24, 2023
12 checks passed
@Mddct Mddct deleted the xcsong-fix branch November 24, 2023 03:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants