Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Announcement] Improving I/O for correct and consistent experience #903

Closed
mthrok opened this issue Sep 10, 2020 · 41 comments
Closed

[Announcement] Improving I/O for correct and consistent experience #903

mthrok opened this issue Sep 10, 2020 · 41 comments

Comments

@mthrok
Copy link
Collaborator

mthrok commented Sep 10, 2020

tl;dr: how to migrate to new backend/interface in 0.7

  • If you are using torchaudio in Linux/macOS environments, please use torchaudio.set_audio_backend("sox_io") to adopt to the upcoming changes.

  • If you are in Windows environment, please set torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE = False and reload backend to use the new interface.

  • Note that this ships with some bug-fixes for formats other than 16bit signed integer WAV, so you might experience some BC-breaking changes as described in the section below.

News
[UPDATE] 2021/03/06

  • All the migration works have been completed on master branch.

[UPDATE] 2021/02/12

  • Added bits_per_sample and encoding argument (replaced dtype) to save function.

[UPDATE] 2021/01/29

  • Added encoding to AudioMetaData

[UPDATE] 2021/01/22

  • Added format argument to load/info/save function.
  • bits_per_sample to AudioMetaData

[UPDATE] 2020/10/21

  • Added Description of "soundfile" backend legacy interface.

[UPDATE] 2020/09/18

  • Added migration guide for "soundfile" backend.
  • Moved the phase when "soundfile" backend signatures change from 0.9.0 to 0.8.0 so that they match with "sox_io" backend, which becomes default in 0.8.0.

[UPDATE] 2020/09/17

  • Added information on deprecation of native libsox structures such as signalinfo_t and encoding_t.

Improving I/O for correct and consistent experience

This is an announcement for users that we are making backward-incompatible changes to I/O functions of torchaudio backends from 0.7.0 release throughout 0.9.0 release.

What is affected?

  • Public APIs

    • torchaudio.load
      • [Linux/macOS] By switching the default backend from "sox" backend to "sox_io" backend in 0.8.0, loading audio formats other than 16bit signed integer WAV returns the correct tensor.
      • [Linux/macOS/Windows] The signature of "soundfile" backend will be change in 0.8.0 to match that of "sox_io" backend.
    • torchaudio.save
      • [Linux/macOS] By switching to "sox_io" backend, saving audio files will no longer degrade the data. The supported format will be restricted to the tested formats only. (please refer to the doc for the supported formats.)
      • [Linux/macOS/Windows] The signature of "soundfile" backend will be change in 0.8.0 to match that of "sox_io" backend.
    • torchaudio.info
      • [Linux/macOS/Windows] The signature of "soundfile" backend will be change in 0.8.0 to match that of "sox_io" backend.
    • torchaudio.load_wav
      • will be removed in 0.9.0. (load function with normalize=False will provide the same functionality)
  • Internal APIs
    The following functions/classes of "sox" backend were accidentally exposed and will be removed in 0.9.0. There is no replacement for them. Please use save/load/info functions.

    • torchaudio.save_encinfo
      • will be removed in 0.9.0
    • torchaudio.get_sox_signalinfo_t
      • will be removed in 0.9.0
    • torchaudio.get_sox_encodinginfo_t
      • will be removed in 0.9.0
    • torchaudio.get_sox_option_t
      • will be removed in 0.9.0
    • torchaudio.get_sox_bool
      • will be removed in 0.9.0

The signatures of the other backends are not planned to be changed within this overhaul plan.

  • Classes
    • torchaudio.SignalInfo and torchaudio.EncodingInfo
      • will be replaced with AudioMetaData in 0.8.0 for "soundfile" backend
      • will be removed in 0.9.0

Why

There are currently three backends in torchaudio. (Please refer to the documentation for the detail.)

"sox" backend is the original backend, which binds libsox with pybind11. The functionalities (load / save / info) of this backend are not well-tested and have number of issues. (See #726).

Fixing these issues in backward-compatible manner is not straightforward. Therefore while we were adding TorchScript-compatible I/O functions, we decided to deprecate this original "sox" backend and replace it with the new backend ("sox_io" backend), which is confirmed not to have those issues.

When we are switching the default backend for Linux/macOS from "sox" to "sox_io" backend, we would like to align the interface of "soundfile" backend, therefore, we introduced the new interface (not a new backend to reduce the number of public API) to "soundfile" backend.

When / What Changes

The following is the timeline for the planned changes;

Phase Expected Release Expected Changes
1 0.7.0
(Oct 2020)
2 0.8.0
(March 2021)
3 0.9.0

Planned signature changes of "soundfile" backend in 0.8.0

The following is the planned signature change of "soundfile" backend functions in 0.8.0 release.

info function

AudioMetaData implementation can be found here. The placement of the AudioMetaData might be changed.

~0.7.0 0.8.0
def info(
  filepath: str,
) ->
  Tuple[SignalInfo, EncodingInfo]
def info(
  filepath: str,
  format: Optional[str],
) ->
  AudioMetaData

Migration

The values returned from info function will be changed. Please use the corresponding new attributes.

~0.7.0 0.8.0
si, ei = torchaudio.info(filepath)
sample_rate = si.rate
num_frames = si.length
num_channels = si.channels
precision = si.precision
bits_per_sample = ei.bits_per_sample
encoding = ei.encoding
metadata = torchaudio.info(filepath)
sample_rate = metadata.sample_rate
num_frames = metadata.num_frames
num_channels = metadata.num_channels
bits_per_sample = metadata.bits_per_sample
encoding = metadata.encoding

Note If the attribute you are using is missing, file a Feature Request issue.

load function

~0.7.0 0.8.0
def load(
  filepath: str,
  # out: Optional[Tensor] = None,
      # To be removed.
      # Currently not used
      # Raise AssertionError if given
  normalization: Optional[bool] = True,
      # To be renamed to normalize.
      # Currently only accept True
      # Raise AssertionError if given
  channels_first: Optional[bool] = True,
  num_frames: int = 0,
  offset: int = 0,
      # To be renamed to frame_offset
  # signalinfo: SignalInfo = None,
      # To be removed
      # Currently not used
      # Raise AssertionError if given
  # encodinginfo: EncodingInfo = None,
      # To be removed
      # Currently not used
      # Raise AssertionError if given
  filetype: Optional[str] = None
      # To be removed
      # Currently not used
) -> Tuple[Tensor, int]
def load(
  filepath: str,
  frame_offset: int = 0,
  num_frames: int = -1,
  normalize: bool = True,
  channels_first: bool = True,
  format: Optional[str] = None,  # only required for file-like object input
) -> Tuple[Tensor, int]
Migration

Please change the argument names;

  • normalization -> normalize
  • offset -> frame_offst
~0.7.0 0.8.0
waveform, sample_rate = torchaudio.load(
    filepath,
    normalization=normalization,
    channels_first=channels_first,
    num_frames=num_frames,
    offset=offset,
)
waveform, sample_rate = torchaudio.load(
    filepath,
    frame_offset=frame_offset,
    num_frames=num_frames,
    normalize= normalization,
    channels_first=channels_first,
)

save function

~0.7.0 0.8.0
def save(
  filepath: str,
  src: Tensor,
  sample_rate: int,
  precision: int = 16,
    # moved to `bits_per_sample` argument
  channels_first: bool = True
)
def save(
  filepath: str,
  src: Tensor,
  sample_rate: int,
  channels_first: bool = True,
  compression: Optional[float] = None,
    # Added only for compatibility.
    # soundfile does not support compression option
    # Raises Warning if not None
  format: Optional[str] = None,
  encoding: Optoinal[str] = None,
  bits_per_sample: Optional[int] = None,
)
Migration
~0.7.0 0.8.0
torchaudio.save(
    filepath,
    waveform,
    sample_rate,
    channels_first
)
torchaudio.save(
    filepath,
    waveform,
    sample_rate,
    channels_first,
    bits_per_sample=16,
)
# You can also designate audio format with `format` and configure the encoding with `compression` and `encoding`. See https://pytorch.org/audio/master/backend.html#save for the detail 

BC-breaking changes

Read and write operations on the formats other than WAV 16-bit signed integer were affected by small bugs.

@mthrok mthrok pinned this issue Sep 10, 2020
mthrok added a commit that referenced this issue Sep 15, 2020
* Add deprecation warning to sox backend

Refer to #903
mthrok added a commit that referenced this issue Sep 28, 2020
As a part of the "sox" backend sunset plan (#903), we add a "soundfile" backend that is compatible with the "sox_io" backend. No new public backend name is added. We provide a switch to change the interface/behavior of "soundfile" backend.

This commit contains;
 - The implementation of the new "soundfile" backend.
 - The flag to switch the behavior of "soundfile" backend. (`torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE`)
 - Test for the new backend and switching mechanism.

The default behavior of "soundfile" backend is not changed. The users who want to opt-in the new "soundfile" interface can do so by `torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE = False` before changing the backend to "soundfile".

In 0.8.0 release, the "soundfile" backend will use this interface by default, and users can still use the legacy one with `torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE = True`. In 0.9.0, the legacy interface is removed and `torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE` flag will be eventually removed.
@mthrok mthrok changed the title [Announcement] Replacing "sox" backend with "sox_io" backend [Announcement] Overhauling I/O for correct and consistent experience Oct 21, 2020
@mthrok mthrok changed the title [Announcement] Overhauling I/O for correct and consistent experience [Announcement] Improving I/O for correct and consistent experience Oct 21, 2020
@snakers4
Copy link

Fixing these issues in backward-compatible manner is not straightforward. Therefore while we were adding TorchScript-compatible I/O functions, we decided to deprecate this original "sox" backend and replace it with the new backend ("sox_io" backend), which is confirmed not to have those issues.

When we are switching the default backend for Linux/macOS from "sox" to "sox_io" backend, we would like to align the interface of "soundfile" backend, therefore, we introduced the new interface (not a new backend to reduce the number of public API) to "soundfile" backend.

Just a quick question, does it mean that since 0.7 or 0.8 we can include torchaudio.load inside of our jit-traced modules? Are you planning to support only Linux, or will you also have a list of binaries for some other platforms (i.e. mobile, raspberry pi)? With soundfile backend?

@mthrok
Copy link
Collaborator Author

mthrok commented Oct 22, 2020

Hi @snakers4

does it mean that since 0.7 or 0.8 we can include torchaudio.load inside of our jit-traced modules?

Yes. Technically, you can do it already with 0.6, however, the corresponding library is not available in any form yet, so you cannot run it outside Python application.
I have a prototype C++ app in my branch which depends on refactored torchaudio. The model I used can be found here

I plan to propose this to the team after the release work, but no fixed time frame for landing it yet or even I am not sure if I can land this.
This was an exercise to learn how much we can do with TorchScript, and I have found that the I/O-capability is very limited. It can only load audio data from files. I intend to look into other ways to get tensor data (like pass memory objects to TorchScript) but it's not in the top priority in my list.

Are you planning to support only Linux, or will you also have a list of binaries for some other platforms (i.e. mobile, raspberry pi)?

We are considering the possibility to add an I/O module (not another backend but something like torchaudio.io), that works not just on Linux/macOS, but also on Windows. We are thinking to bind a correction of codecs libraries that are cross-platform. Mobile is not necessarily in our scope, because we do not have an infrastructure to test them, or we have not seen a demand for it yet. Hypothetically, if the refactored torchaudio is landed, the build-process will be CMake, so it will be easier for those familiar with CMake, but again, these plans are not finalized. We are trying to figure out a good "research to production" usecase.

With soundfile backend?

The Python "soudfile" package is not TorchScript compatible, so one of the thing we are considering as a part of the I/O module described above is to bind libsnd directly.

@snakers4
Copy link

Nice! This is probably months from becoming actually useful by end users like us, but this increases the value of pytorch ecosystem quite a bit

Btw, currently a vad in torch audio seems to be a port of some energy based algorithm

We are planning to make a public general torch-scriptable noise / voise / music VAD pre-trained on large voice / noise / music corpora

Guess we could collaborate on that

@mthrok
Copy link
Collaborator Author

mthrok commented Oct 26, 2020

@snakers4

Nice! This is probably months from becoming actually useful by end users like us,

Ah, that's very optimistic view, although that's what I am aiming for. I am working on a RFC with example usage, so that community can respond. Then we will finalize the interface and will start working on the implementation.

but this increases the value of pytorch ecosystem quite a bit

Thanks, that's a nice reaction to have. One of the things we struggle is to get a signal from the community, so feedback like that is really helpful. (and motivating for me ;) )

Btw, currently a vad in torch audio seems to be a port of some energy based algorithm

The current VAD is basically, the port of sox implementation.

We are planning to make a public general torch-scriptable noise / voise / music VAD pre-trained on large voice / noise / music corpora

Guess we could collaborate on that

That's very interesting. Please keep us updated!

@snakers4
Copy link

snakers4 commented Oct 27, 2020

One of the things we struggle is to get a signal from the community, so feedback like that is really helpful. (and motivating for me ;) )

the current state of audio is that there are no go-to tools / components, that would work on all platforms
there is record.js for browsers, but porting models to js is a pain now (looks like the only decent option is re-implementing from scratch in tf.js, onnx.js has very poor layer support)
ofc, you can go low-level and compile everything for each platform, but usually you care about your algorithms working properly in real life first

in real projects you basically need a VAD + STT + some post-processing
VAD ideally should be served on edge to improve user experience, whereas STT can be better served via an API (if you use OPUS e.g. traffic is negligible)
there is nothing stopping us from making our own VAD in PyTorch, but the actual audio reading part will be outside as well

for edge deployments we still need 2-4x size reduction in model size (which is already achievable) but as I mentioned there still is no easy way to run a pytorch model in a browser

That's very interesting. Please keep us updated!

I will post an update here

mthrok added a commit that referenced this issue Oct 27, 2020
Refer to #903 for the overview of planned I/O changes.

* Change the default backend from `"sox"(deprecated)` to `"sox_io"`
* Change the default interface of `"soundfile"` backend to the one identical to `"sox_io"` backend.
* Deprecate torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE
* Update documentations
    * Re-order backends (default first)
    * Update overhaul timeline (removed 0.7.0)
    * Simplify `"soundfile"` backend description
@tbazin
Copy link

tbazin commented Nov 5, 2020

This is great news, this will definitely improve trust and adoption of torchaudio 🙂 !

hoangtnm added a commit to hoangtnm/audio that referenced this issue Nov 28, 2020
In line [151-160](https://github.com/pytorch/audio/blob/master/examples/pipeline_wav2letter/main.py#L151) and Line [437](https://github.com/pytorch/audio/blob/fb3ef9ba427acd7db3084f988ab55169fab14854/examples/pipeline_wav2letter/main.py#L437) of main.py, the default value of `dataset-root` and `dataset-folder-in-archive` will be None, which prevents `main.py` from knowing where the dataset is actually in the computer and loading it.

Moreover, `n-hidden-channels 2000` has not been defined in `main.py`, so it needs to be removed.

Erro log:

```bash
python main.py \
    --reduce-lr-valid \
    --dataset-train train-clean-100 train-clean-360 train-other-500 \
    --dataset-valid dev-clean \
    --batch-size 128 \
    --learning-rate .6 \
    --momentum .8 \
    --weight-decay .00001 \
    --clip-grad 0. \
    --gamma .99 \
    --hop-length 160 \
    --win-length 400 \
    --n-bins 13 \
    --normalize \
    --optimizer adadelta \
    --scheduler reduceonplateau \                                         
    --epochs 30

/home/hoangtnm/anaconda3/envs/dl/lib/python3.7/site-packages/torchaudio/backend/utils.py:54: UserWarning: "sox" backend is being deprecated. The default backend will be changed to "sox_io" backend in 0.8.0 and "sox" backend will be removed in 0.9.0. Please migrate to "sox_io" backend. Please refer to pytorch#903 for the detail.
  '"sox" backend is being deprecated. '
INFO:root:Namespace(batch_size=128, checkpoint='', clip_grad=0.0, dataset_folder_in_archive=None, dataset_root=None, dataset_train=['train-clean-100', 'train-clean-360', 'train-other-500'], dataset_valid=['dev-clean'], decoder='greedy', distributed=False, epochs=30, eps=1e-08, freq_mask=0, gamma=0.99, hop_length=160, jit=False, learning_rate=0.6, momentum=0.8, n_bins=13, normalize=True, optimizer='adadelta', progress_bar=False, reduce_lr_valid=True, rho=0.95, scheduler='reduceonplateau', seed=0, start_epoch=0, time_mask=0, type='mfcc', weight_decay=1e-05, win_length=400, workers=0, world_size=8)
INFO:root:Start time: 2020-11-28 21:18:22.337478
/home/hoangtnm/anaconda3/envs/dl/lib/python3.7/site-packages/torchaudio/backend/utils.py:64: UserWarning: The interface of "soundfile" backend is planned to change in 0.8.0 to match that of "sox_io" backend and the current interface will be removed in 0.9.0. To use the new interface, do `torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE = False` before setting the backend to "soundfile". Please refer to pytorch#903 for the detail.
  'The interface of "soundfile" backend is planned to change in 0.8.0 to '
Traceback (most recent call last):
  File "main.py", line 670, in <module>
    spawn_main(main, args)
  File "main.py", line 663, in spawn_main
    main(0, args)
  File "main.py", line 454, in main
    root=args.dataset_root,
  File "/media/aiteam/DATA/workspace/hoangtnm/audio/examples/pipeline_wav2letter/src/datasets.py", line 65, in split_process_vlsp2020asr
    return tuple(create(dataset) for dataset in datasets)
  File "/media/aiteam/DATA/workspace/hoangtnm/audio/examples/pipeline_wav2letter/src/datasets.py", line 65, in <genexpr>
    return tuple(create(dataset) for dataset in datasets)
  File "/media/aiteam/DATA/workspace/hoangtnm/audio/examples/pipeline_wav2letter/src/datasets.py", line 57, in create
    for tag, transform in zip(tags, transform_list)
  File "/media/aiteam/DATA/workspace/hoangtnm/audio/examples/pipeline_wav2letter/src/datasets.py", line 57, in <listcomp>
    for tag, transform in zip(tags, transform_list)
  File "/media/aiteam/DATA/workspace/hoangtnm/audio/examples/pipeline_wav2letter/src/datasets.py", line 15, in __init__
    self._path = os.path.join(root, url)
  File "/home/hoangtnm/anaconda3/envs/dl/lib/python3.7/posixpath.py", line 80, in join
    a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType
```
@expectopatronum
Copy link

This might be a stupid question, but should the warning UserWarning: "sox" backend is being deprecated. The default backend will be changed to "sox_io" backend in 0.8.0 and "sox" backend will be removed in 0.9.0. Please migrate to "sox_io" backend. Please refer to https://github.com/pytorch/audio/issues/903 for the detail. disappear after setting the backend?

I import torchaudio in the following way:

import torchaudio
torchaudio.set_audio_backend("sox_io")

but still get the above warning.

@mthrok
Copy link
Collaborator Author

mthrok commented Dec 14, 2020

Hi @expectopatronum

The warning is issued at the time import torchaudio is executed, where the default backend is set. I get that it's annoying and sorry for the confusion, but I really needed to raise a strong awareness as the sox backend was not handling data correctly.

@ketanhdoshi
Copy link

With torchaudio.load() in v0.8, the sox_io backend does not support 24-bit signed PCM audio files. Right now the only workaround is to switch back to the sox backend using torchaudio.set_audio_backend("sox").
Is 24-bit signed going to be supported in 0.9 before removing sox? Thanks!
It is not possible to convert the dataset I'm using to 16-bit or 32-bit.

Hi @ketanhdoshi

Thanks for the report. If it's causing you the trouble, we will definitely support it.
Since PyTorch does not have 24-bit int type. I need to think of a behavior when normalize=False.
In your use case, are you loading data in float32 type?
Also if you can tell us a command to generate the same type you are dealing with (with tools like ffmpeg or sox), that will be helpful.

Thanks @mthrok. Yes, data is being loaded as float32. Here's an example of a dataset that has many sound files that I'm using that are in 24-bit signed format.

@aelimame
Copy link

With torchaudio.load() in v0.8, the sox_io backend does not support 24-bit signed PCM audio files. Right now the only workaround is to switch back to the sox backend using torchaudio.set_audio_backend("sox").
Is 24-bit signed going to be supported in 0.9 before removing sox? Thanks!
It is not possible to convert the dataset I'm using to 16-bit or 32-bit.

Hi @ketanhdoshi
Thanks for the report. If it's causing you the trouble, we will definitely support it.
Since PyTorch does not have 24-bit int type. I need to think of a behavior when normalize=False.
In your use case, are you loading data in float32 type?
Also if you can tell us a command to generate the same type you are dealing with (with tools like ffmpeg or sox), that will be helpful.

Thanks @mthrok. Yes, data is being loaded as float32. Here's an example of a dataset that has many sound files that I'm using that are in 24-bit signed format.

I'm running into the same issue. I'm loading some 24bit audio files and sox_io fails to load them. I can use sox backend for now but would appreciate if 24bit format can be supported too in sox_io.

A good way to handle the normalize=False is to make it unsupported for this specific format given most of the time people would use normalize=True (at least that's what I do almost always). Another idea would be to convert the 24bit format automatically/internally to 32bit even if normalize=False.

Thanks

@aelimame
Copy link

aelimame commented Mar 17, 2021

@ketanhdoshi 24-bit support seems to have been added a couple days ago to the master branch #1389
I tested it (Nightly build) and seems to work for me!

@mthrok
Copy link
Collaborator Author

mthrok commented Mar 17, 2021

@aelimame @ketanhdoshi Sorry I forgot to let you know but we added 24-bit support.

It's nice to learn that it is working for you @aelimame.
@ketanhdoshi , please try the nightly build and see if it works. If not let us know.

@mthrok
Copy link
Collaborator Author

mthrok commented Apr 7, 2021

FYI: @ketanhdoshi @aelimame 24-bit support has been ported to release 0.8.1.

@mthrok
Copy link
Collaborator Author

mthrok commented Jun 15, 2021

Closing the issue as 0.9 is released which concludes the migration.
Thank you for all the people who gave feedback.

@mthrok mthrok closed this as completed Jun 15, 2021
@mthrok mthrok unpinned this issue Jun 15, 2021
alexmehta added a commit to alexmehta/ABAW2020TNT-Modified that referenced this issue Jun 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants