AudioBackend specific save_audio and info, managing missing SoX…

… in torchaudio, Python 3.12 / PyTorch 2.2 support, using `libsndfile` as preferred audio backend (#1288) * AudioBackend supports save_audio() [code cleanup] * Fix for Path handling; skip some tests on older torchaudio * Move info() implementation to each AudioBackend * Update CI configurations, fix more tests * Conditionally build kaldifeat, fix some more tests * Fix more tests, remove dead code, bump version * Remaining fixes, legacy OPUS reading mode env var * Fixes for torchaudio==2.0.0 * Fixes for save_audio/save_audios * Skip backends for audio saving that are not applicable (e.g. torchaudio backends when it's not installed) * Prefer LibsndfileBackend as the default lhotse backend (except for some special cases) + fix CutSet.copy_data()
lhotse-speech · Feb 13, 2024 · 769c273 · 769c273
1 parent 2473491
commit 769c273
Show file tree

Hide file tree

Showing 21 changed files with 642 additions and 412 deletions.
diff --git a/.github/workflows/unit_tests.yml b/.github/workflows/unit_tests.yml
@@ -17,13 +17,20 @@ jobs:
       matrix:
         include:
           - python-version: "3.8"
-            torch-install-cmd: "pip install torch==1.8.2+cpu torchaudio==0.8.2 -f https://download.pytorch.org/whl/lts/1.8/torch_lts.html"
+            torch-install-cmd: "pip install torch==1.12.1 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cpu"
+            extra_deps: kaldifeat
           - python-version: "3.9"
-            torch-install-cmd: "pip install torch==1.8.2+cpu torchaudio==0.8.2 -f https://download.pytorch.org/whl/lts/1.8/torch_lts.html"
+            torch-install-cmd: "pip install torch==2.0 torchaudio==2.0 --extra-index-url https://download.pytorch.org/whl/cpu"
+            extra_deps: kaldifeat
           - python-version: "3.10"
-            torch-install-cmd: "pip install torch==1.12.1 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cpu"
+            torch-install-cmd: "pip install torch==2.1 torchaudio==2.1 --extra-index-url https://download.pytorch.org/whl/cpu"
+            extra_deps: ""
           - python-version: "3.11"
-            torch-install-cmd: "pip install torch==2.0 torchaudio==2.0 --extra-index-url https://download.pytorch.org/whl/cpu"
+            torch-install-cmd: "pip install torch==2.2 torchaudio==2.2 --extra-index-url https://download.pytorch.org/whl/cpu"
+            extra_deps: ""
+          - python-version: "3.12"
+            torch-install-cmd: "pip install torch==2.2 torchaudio==2.2 --extra-index-url https://download.pytorch.org/whl/cpu"
+            extra_deps: ""
 
       fail-fast: false
 
@@ -50,7 +57,7 @@ jobs:
         # the torchaudio env var does nothing when torchaudio is installed, but doesn't require it's presence when it's not
         pip install '.[tests]'
         # Enable some optional tests
-        pip install h5py dill smart_open[http] kaldifeat kaldi_native_io webdataset==0.2.5 s3prl scipy nara_wpe pyloudnorm
+        pip install h5py dill smart_open[http] kaldi_native_io webdataset==0.2.5 s3prl scipy nara_wpe pyloudnorm ${{ matrix.extra_deps }}
     - name: Install sph2pipe
       run: |
         lhotse install-sph2pipe  # Handle sphere files.

diff --git a/README.md b/README.md
@@ -107,6 +107,7 @@ Lhotse uses several environment variables to customize it's behavior. They are a
 - `LHOTSE_AUDIO_BACKEND` - may be set to any of the values returned from CLI `lhotse list-audio-backends` to override the default behavior of trial-and-error and always use a specific audio backend.
 - `LHOTSE_AUDIO_LOADING_EXCEPTION_VERBOSE` - when set to `1` we'll emit full exception stack traces when every available audio backend fails to load a given file (they might be very large).
 - `LHOTSE_DILL_ENABLED` - when it's set to `1|True|true|yes`, we will enable `dill`-based serialization of `CutSet` and `Sampler` across processes (it's disabled by default even when `dill` is installed).
+- `LHOTSE_LEGACY_OPUS_LOADING` - (`=1`) reverts to a legacy OPUS loading mechanism that triggered a new ffmpeg subprocess for each OPUS file.
 - `LHOTSE_PREPARING_RELEASE` - used internally by developers when releasing a new version of Lhotse.
 - `TORCHAUDIO_USE_BACKEND_DISPATCHER` - when set to `1` and torchaudio version is below 2.1, we'll enable the experimental ffmpeg backend of torchaudio.
 - `RANK`, `WORLD_SIZE`, `WORKER`, and `NUM_WORKERS` are internally used to inform Lhotse Shar dataloading subprocesses.

diff --git a/VERSION b/VERSION
@@ -1 +1 @@
-1.20.0
+1.21.0
diff --git a/docs/getting-started.rst b/docs/getting-started.rst
@@ -127,6 +127,8 @@ Lhotse uses several environment variables to customize it's behavior. They are a
 
 * ``LHOTSE_DILL_ENABLED`` - when it's set to ``1|True|true|yes``, we will enable ``dill``-based serialization of ``CutSet`` and ``Sampler`` across processes (it's disabled by default even when ``dill`` is installed).
 
+* ``LHOTSE_LEGACY_OPUS_LOADING`` - (``=1``) reverts to a legacy OPUS loading mechanism that triggered a new ffmpeg subprocess for each OPUS file.
+
 * ``LHOTSE_PREPARING_RELEASE`` - used internally by developers when releasing a new version of Lhotse.
 
 * ``TORCHAUDIO_USE_BACKEND_DISPATCHER`` - when set to 1 and torchaudio version is below 2.1, we'll enable the experimental ffmpeg backend of torchaudio.