Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix eot #2330

Merged
merged 1 commit into from
Feb 1, 2024
Merged

Fix eot #2330

merged 1 commit into from
Feb 1, 2024

Conversation

Qiaochu-Song
Copy link
Contributor

There is a bug when generating train.yaml from whisper checkpoint. Eot was using the token index of sot, and this pr fixes that.

@Mddct Mddct requested a review from xingchensong February 1, 2024 01:22
@robin1001 robin1001 removed the request for review from xingchensong February 1, 2024 01:31
@xingchensong
Copy link
Member

good catch! How did you find it? Does it affect trainining results?

@xingchensong xingchensong merged commit 22642e4 into wenet-e2e:main Feb 1, 2024
6 checks passed
@Qiaochu-Song
Copy link
Contributor Author

Qiaochu-Song commented Feb 1, 2024

good catch! How did you find it? Does it affect trainining results?

I was trying to convert and load whisper into wenet, then tried some samples without any fine-tuning. I was expecting it to give some reasonable outputs. However it did not stop decoding, even after eot was produced. Other than that the transcription looks correct. After changing eot to the correct idx, wenet decoding functions is able to terminate decoding correctly.

Regarding effects on finetuning, I have not test it yet, but I assume it might affect training results a bit, but should not be too much, as tokens after won't be penalized in training.

@xingchensong
Copy link
Member

I see, THX !!!

@xingchensong
Copy link
Member

I can confirm that this fix does not affect training result.

  • green: eot is 50258
  • red: eot is 50257

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants