Skip to content

Files

Latest commit

b0be694 · May 4, 2023

History

History

nvidia-nemo

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
Mar 4, 2023
Mar 7, 2023
Mar 7, 2023
Feb 24, 2023
May 4, 2023
Mar 4, 2023
Mar 7, 2023
Feb 24, 2023
Mar 6, 2023
Mar 10, 2023

NVIDIA NeMo ASR

NVIDIA NeMo is a toolkit for conversational AI including several models for ASR.

Links:

Tested with:

  • Arm64 - Debian 11 - Python 3.9
  • x86_64 - Debian 11 - Python 3.9

Train and use language models

The beam search decoders in NeMo support N-gram language models trained for example with the KenLM toolbox.
Use bash install_lm_tools.sh to get pre-built KenLM binaries and the required ctc-decoders. The wheels are not available for all Python version, so you might need to build them yourself (see: asr_language_modeling/ngram_lm/install_beamsearch_decoders.sh).

  • Check-out the 'train' folder to learn how to build a simple n-gram LM in ~5 minutes
  • Use python3 test_with_lm.py -h to see available options
  • Read the Nvidia 'language modeling' docs (see above) to learn about beam width, alpha and beta parameters
  • To run inference with LM use something like python3 test_with_lm.py -m "models/stt_en_conformer_ctc_small.nemo" -l "models/kenlm_custom.4gram" -w 4 -a 1.0 -b 2.0
  • You can get a "naive" character error rate (CER) score for transcriptions by adding a txt-file for each wav-file with the target transcription (same name, just '.txt' ending) and using the --transcriptions [folder] argument