NVIDIA · titu1994 · Jan 23, 2024 · Jan 23, 2024
diff --git a/docs/source/asr/datasets.rst b/docs/source/asr/datasets.rst
@@ -265,8 +265,11 @@ You can easily convert your existing NeMo-compatible ASR datasets using the
     --num_shards=<number of tarfiles that will contain the audio>
     --max_duration=<float representing maximum duration of audio samples> \
     --min_duration=<float representing minimum duration of audio samples> \
+    --force_codec=flac \
     --shuffle --shuffle_seed=0
 
+.. note:: For extra reduction of storage space at the cost of lossy (but high-quality) compression, you may use ``--force_codec=opus`` instead.
+
 This script shuffles the entries in the given manifest (if ``--shuffle`` is set, which we recommend), filter
 audio files according to ``min_duration`` and ``max_duration``, and tar the remaining audio files to the directory
 ``--target_dir`` in ``n`` shards, along with separate manifest and metadata files.

diff --git a/scripts/speech_recognition/convert_to_tarred_audio_dataset.py b/scripts/speech_recognition/convert_to_tarred_audio_dataset.py
@@ -42,6 +42,7 @@
     --min_duration=<float representing minimum duration of audio samples> \
     --shuffle --shuffle_seed=1 \
     --sort_in_shards \
+    --force_codec=flac \
     --workers=-1
 
 
@@ -56,7 +57,7 @@
     --shuffle --shuffle_seed=1 \
     --sort_in_shards \
     --workers=-1 \
-    --concat_manifest_paths \
+    --concat_manifest_paths
     <space separated paths to 1 or more manifest files to concatenate into the original tarred dataset>
 
 3) Writing an empty metadata file