focus on fusion on speech recognition
When a language model is used wide beam searches often yield incomplete transcripts. With narrow beams, the problem is less visible due to implicit hypothesis pruning.
See if it appears in ctc+lm fusion
- adaptive softmax for large voca (because pytorch offical implementation can't work with torchscript)
- onnx support and torchscript
- gru
- rnn tie embedding
- gru fusion on wenet runtime ctc prefix beam search
- transformer-xl with cache
- transformer-xl with cache to fusion
- mwer training when lm fusion
- etc
- Deep Speech: Scaling up end-to-end speech recognition
- END-TO-END ATTENTION-BASED LARGE VOCABULARY SPEECH RECOGNITION
- On Using Monolingual Corpora in Neural Machine Translation
- First-Pass Large Vocabulary Continuous Speech Recognition using Bi-Directional Recurrent DNNs
- Towards better decoding and language model integration in sequence to sequence models
- END-TO-END ATTENTION-BASED LARGE VOCABULARY SPEECH RECOGNITION
- Efficient softmax approximation for GPUs
- Using the Output Embedding to Improve Language Models
- Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition
- etc