Fix eot #2330

Qiaochu-Song · 2024-02-01T01:02:19Z

There is a bug when generating train.yaml from whisper checkpoint. Eot was using the token index of sot, and this pr fixes that.

xingchensong · 2024-02-01T01:53:01Z

good catch! How did you find it? Does it affect trainining results?

Qiaochu-Song · 2024-02-01T02:10:46Z

good catch! How did you find it? Does it affect trainining results?

I was trying to convert and load whisper into wenet, then tried some samples without any fine-tuning. I was expecting it to give some reasonable outputs. However it did not stop decoding, even after eot was produced. Other than that the transcription looks correct. After changing eot to the correct idx, wenet decoding functions is able to terminate decoding correctly.

Regarding effects on finetuning, I have not test it yet, but I assume it might affect training results a bit, but should not be too much, as tokens after won't be penalized in training.

xingchensong · 2024-02-01T02:12:42Z

I see, THX !!!

xingchensong · 2024-02-04T04:48:10Z

I can confirm that this fix does not affect training result.

green: eot is 50258
red: eot is 50257

Fix eot

e7afed5

Mddct requested a review from xingchensong February 1, 2024 01:22

robin1001 removed the request for review from xingchensong February 1, 2024 01:31

xingchensong approved these changes Feb 1, 2024

View reviewed changes

xingchensong merged commit 22642e4 into wenet-e2e:main Feb 1, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix eot #2330

Fix eot #2330

Qiaochu-Song commented Feb 1, 2024

xingchensong commented Feb 1, 2024

Qiaochu-Song commented Feb 1, 2024 •

edited

Loading

xingchensong commented Feb 1, 2024

xingchensong commented Feb 4, 2024

Fix eot #2330

Fix eot #2330

Conversation

Qiaochu-Song commented Feb 1, 2024

xingchensong commented Feb 1, 2024

Qiaochu-Song commented Feb 1, 2024 • edited Loading

xingchensong commented Feb 1, 2024

xingchensong commented Feb 4, 2024

Qiaochu-Song commented Feb 1, 2024 •

edited

Loading