Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[runtime] Configurable blank token idx #2366

Merged
merged 3 commits into from
Feb 26, 2024

Conversation

zhr1201
Copy link
Contributor

@zhr1201 zhr1201 commented Feb 22, 2024

Following #2320 which makes feature extraction pipeline compatible with whisper. We realized blank token idx is hard coded #2329, resulting in weird decoding results in the previous PR.

This PR is aiming at fixing that. There might be some places that i missed, can you folks help me do another pass?

Screenshot 2024-02-22 at 2 19 12 PM

Current RTF is 2.xx after warming up, using one core with beam size of 8, on my local macbook. I think the model is probably capable of running at real time with avx512 on multiple cores, even with whisper large. so let's get it to work!

However, there is still some loose ends. Maybe you guys already know the answer and can share some insights.

  • decoder main result still slightly differs from transcribe.py imitating streaming inference using attention masks. I don't know exactly why, maybe some default params are different?
  • In order to create the decoding TLG graph, some tools might needs to be updated to support flexible blank token id. I haven't checked this one, but now this is not a high priority for us.

@zhr1201 zhr1201 marked this pull request as ready for review February 22, 2024 19:26
@@ -153,7 +153,7 @@ std::string ProcessBlank(const std::string& str, bool lowercase) {
std::wstring_convert<std::codecvt_utf8<wchar_t>, wchar_t> converter;
std::wstring wsresult = converter.from_bytes(result);
for (auto& c : wsresult) {
c = lowercase ? tolower(c, loc) : toupper(c, loc);
c = lowercase ? tolower(c, loc) : c;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this changes the --nolowercase behavior, we could also pass another flag if that's what we want to do

@zhr1201 zhr1201 changed the title Configurable blank token idx [runtime] Configurable blank token idx Feb 22, 2024
@zhr1201 zhr1201 force-pushed the configurable_blank_token_idx branch from d42ce1c to 1757e3b Compare February 23, 2024 04:16
@zhr1201 zhr1201 force-pushed the configurable_blank_token_idx branch from 1757e3b to 808ad62 Compare February 23, 2024 04:17
@robin1001
Copy link
Collaborator

Great job!

@robin1001 robin1001 merged commit 89ef2e7 into wenet-e2e:main Feb 26, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants