[runtime] Configurable blank token idx #2366

zhr1201 · 2024-02-22T19:26:30Z

Following #2320 which makes feature extraction pipeline compatible with whisper. We realized blank token idx is hard coded #2329, resulting in weird decoding results in the previous PR.

This PR is aiming at fixing that. There might be some places that i missed, can you folks help me do another pass?

Current RTF is 2.xx after warming up, using one core with beam size of 8, on my local macbook. I think the model is probably capable of running at real time with avx512 on multiple cores, even with whisper large. so let's get it to work!

However, there is still some loose ends. Maybe you guys already know the answer and can share some insights.

decoder main result still slightly differs from transcribe.py imitating streaming inference using attention masks. I don't know exactly why, maybe some default params are different?
In order to create the decoding TLG graph, some tools might needs to be updated to support flexible blank token id. I haven't checked this one, but now this is not a high priority for us.

zhr1201 · 2024-02-22T19:28:41Z

runtime/core/utils/string.cc

@@ -153,7 +153,7 @@ std::string ProcessBlank(const std::string& str, bool lowercase) {
      std::wstring_convert<std::codecvt_utf8<wchar_t>, wchar_t> converter;
      std::wstring wsresult = converter.from_bytes(result);
      for (auto& c : wsresult) {
-        c = lowercase ? tolower(c, loc) : toupper(c, loc);
+        c = lowercase ? tolower(c, loc) : c;


this changes the --nolowercase behavior, we could also pass another flag if that's what we want to do

robin1001 · 2024-02-26T10:38:18Z

Great job!

hzhou245 added 2 commits February 21, 2024 23:07

Pass in blank id

58876a2

Fix

e711533

zhr1201 marked this pull request as ready for review February 22, 2024 19:26

zhr1201 commented Feb 22, 2024

View reviewed changes

zhr1201 changed the title ~~Configurable blank token idx~~ [runtime] Configurable blank token idx Feb 22, 2024

zhr1201 force-pushed the configurable_blank_token_idx branch from d42ce1c to 1757e3b Compare February 23, 2024 04:16

ctc endpointing blank id

808ad62

zhr1201 force-pushed the configurable_blank_token_idx branch from 1757e3b to 808ad62 Compare February 23, 2024 04:17

robin1001 approved these changes Feb 26, 2024

View reviewed changes

robin1001 merged commit 89ef2e7 into wenet-e2e:main Feb 26, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[runtime] Configurable blank token idx #2366

[runtime] Configurable blank token idx #2366

zhr1201 commented Feb 22, 2024 •

edited

Loading

zhr1201 Feb 22, 2024

robin1001 commented Feb 26, 2024

[runtime] Configurable blank token idx #2366

[runtime] Configurable blank token idx #2366

Conversation

zhr1201 commented Feb 22, 2024 • edited Loading

zhr1201 Feb 22, 2024

Choose a reason for hiding this comment

robin1001 commented Feb 26, 2024

zhr1201 commented Feb 22, 2024 •

edited

Loading