Skip to content

Commit

Permalink
add whisper
Browse files Browse the repository at this point in the history
  • Loading branch information
xingchensong committed Feb 28, 2024
1 parent 422f114 commit 68e4e8c
Show file tree
Hide file tree
Showing 2 changed files with 14 additions and 0 deletions.
1 change: 1 addition & 0 deletions examples/wenetspeech/s0/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@
* Feature info: using fbank feature, with dither 1.0, with cmvn
* Training info: lr 0.001, batch size dynamic36000, 8 gpus on 3090, acc_grad 4, 130k steps, 4.6 days
* Decoding info: ctc_weight 0.5, reverse_weight 0.0, average_num 5, blank penalty 0.0, length penalty 0.0
* PR link: https://github.com/wenet-e2e/wenet/pull/2371

| Decoding mode - Chunk size | Dev | Test\_Net | Test\_Meeting |
|:-----------------------------:|:----:|:---------:|:-------------:|
Expand Down
13 changes: 13 additions & 0 deletions examples/wenetspeech/whisper/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,19 @@ python local/modify_ckpt.py \
| attention | 7.27 % N=328207 C=308016 S=11392 D=8799 I=3672 | 7.90 % N=414097 C=383382 S=18954 D=11761 I=2018 | 13.00 % N=220358 C=194417 S=11788 D=14153 I=2705 |
| attention_rescoring | 8.95 % N=328207 C=305892 S=16696 D=5619 I=7056 | 10.83 % N=414097 C=371515 S=30229 D=12353 I=2269 | 15.64 % N=220358 C=193717 S=18669 D=7972 I=7812 |

## Whisper-largev3 (conv1d2, full-parameter tuning) Result (text\_fixed, see https://github.com/wenet-e2e/WenetSpeech/discussions/54)

* Feature info: using log_mel_spectrogram feature, no cmvn
* Training info: bf16, deepspeed stage1, activation checkpointing, batch dynamic12000, acc_grad 8, 8 * 3090 gpu, 48k steps (about 6 days), conf/finetune_whisper_largev3.yaml
* Decoding info: ctc_weight 0.0, average_num 5
* PR link: https://github.com/wenet-e2e/wenet/pull/2371

| decoding_method | Dev | Test\_Net | Test\_Meeting |
|:-------------------:|:----:|:---------:|:-------------:|
| ctc_greedy_search | 7.09 % N=328207 C=308643 S=16976 D=2588 I=3709 | 10.98 % N=414092 C=373301 S=33375 D=7416 I=4697 | 12.84 % N=220358 C=194928 S=18398 D=7032 I=2862 |
| attention | 4.66 % N=328207 C=315591 S=10352 D=2264 I=2692 | 6.54 % N=414092 C=389523 S=19101 D=5468 I=2513 | 8.84 % N=220358 C=202722 S=11296 D=6340 I=1839 |
| attention_rescoring | 5.99 % N=328207 C=311106 S=14807 D=2294 I=2547 | 9.27 % N=414092 C=378406 S=28993 D=6693 I=2715 | 11.47 % N=220358 C=197013 S=16716 D=6629 I=1923 |

# Frequently Asked Questions

- Q: Why are there so many insertion errors in the decoding results of CTC and attention_rescoring?
Expand Down

0 comments on commit 68e4e8c

Please sign in to comment.