Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[runtime/gpu] Add GPU Hotwords #1860

Merged
merged 20 commits into from
May 24, 2023
Merged

[runtime/gpu] Add GPU Hotwords #1860

merged 20 commits into from
May 24, 2023

Conversation

zwglory
Copy link
Contributor

@zwglory zwglory commented May 19, 2023

Support hotword boosting for offline or streaming WeNet ASR models on GPUs.

For more information regarding the ctc_decoder, please refer to:Slyne/ctc_decoder#11

Here are our test results:

Tested ENV

  • CPU:40 Core, Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz
  • GPU:NVIDIA GeForce RTX 2080 Ti

Hotwords file: https://huggingface.co/58AILab/wenet_u2pp_aishell1_with_hotwords/tree/main/models

AISHELL-1 Test dataset

  • Test set contains 7176 utterances (5 hours) from 20 speakers.
model (FP16) RTF CER
offline model w/o hotwords 0.00437 4.6805
offline model w/ hotwords 0.00428 4.5841
streaming model w/o hotwords 0.01231 5.2777
streaming model w/ hotwords 0.01195 5.1850

AISHELL-1 hostwords sub-testsets

  • Test set contains 235 utterances with 187 entities words.
model (FP16) Latency (s) CER Recall Precision F1-score
offline model w/o hotwords 5.8673 13.85 0.27 0.99 0.43
offline model w/ hotwords 5.6601 11.96 0.47 0.97 0.63

Decoding result

Label hotwords pred w/o hotwords pred w/ hotwords
以及拥有陈露的女单项目 陈露 以及拥有陈鹭的女单项目 以及拥有陈露的女单项目
庞清和佟健终于可以放心地考虑退役的事情了 庞清
佟健
庞青董建终于可以放心地考虑退役的事情了 庞清佟健终于可以放心地考虑退役的事情了
赵继宏老板电器做厨电已经三十多年了 赵继宏 赵继红老板电器做厨店已经三十多年了 赵继宏老板电器做厨电已经三十多年了

Refer to more results: https://huggingface.co/58AILab/wenet_u2pp_aishell1_with_hotwords/tree/main/results

Mainly developed by @FieldsMedal

Copy link
Collaborator

@yuekaizhang yuekaizhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. Would you mind merging the hotwords directory such that it would become a default settings?

  1. merge model_repo_hotwords with regular model repos.
  2. merge hotwords/readme.md into gpu/readme.md using a new section ### hotwords
  3. hotwords.yaml could be under model_repo_x/scoring/, however, default hotwords_path is None like lm_path. In readme.md, set it to ./model_repo_x/scoring/hotwords.yaml to enable hotwords. e.g. sed -i /hotwords_path/xxx/ config_template.pbtxt
  4. In this way, we may don't need convert_start_x.sh

Copy link
Collaborator

@yuekaizhang yuekaizhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it would be nice if you could upload the script to comput f1 to scripts/.

@zwglory
Copy link
Contributor Author

zwglory commented May 24, 2023

Also, it would be nice if you could upload the script to comput f1 to scripts/.

No problem, we are going to merge the hotwords directory and upload the f1 code.

…del_repo_x; add hotwords evaluation script; update readme.
@zwglory
Copy link
Contributor Author

zwglory commented May 24, 2023

In the latest commit, we

  1. remove the hotwords directory and merge it into regular model_repo_x;
  2. add hotwords evaluation script;
  3. update readme with hotwords usage and evaluation.

Please help us review this pr when you are free. Thanks♪(・ω・)ノ

@yuekaizhang
Copy link
Collaborator

Many thanks.

@yuekaizhang yuekaizhang merged commit c68f920 into wenet-e2e:main May 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants