Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create wav file source and sink #576

Closed
gavv opened this issue Sep 29, 2023 · 14 comments
Closed

Create wav file source and sink #576

gavv opened this issue Sep 29, 2023 · 14 comments
Assignees
Labels
category: sound io Task related to audio I/O easy hacks Solution is expected to be straightforward even if you are new to the project help wanted An important and awaited task but we have no human resources for it yet
Milestone

Comments

@gavv
Copy link
Member

gavv commented Sep 29, 2023

Add new source and sink implementations: WavSource and WavSink. Also add WavBackend corresponding to them and register it in BackendDispatcher.

WavSource should read wav file using dr_wav library. The library is single header, which we can just copy to 3rdparty/dr_wav and add to include path in 3rdparty/SConscript the same way as we do it for hedley.

WavSink should write wav file. But instead of using dr_wav, we should implement writing manually. The reason is that we need to update wav file header with the new size each time when we append more samples to file. This feature makes output file usable at any point of time. If we used dr_wav instead, the file would be usable only after closing the file.

The new backend should be employed when user specifies file://<path>.wav as --input parameter of roc-recv or --output parameter of roc-send. We should ensure that roc-recv and roc-send will employ this backend instead of sox for wav files, and will work correctly.

Wav source or sink should report that it doesn't have clock and latency. Wav source should detect sample rate and channel count from the file.

Source, sink, and backend interfaces are here: src/internal_modules/roc_sndio (see ISource, ISink, and IBackend)
Example of backend implementation can be found here: src/internal_modules/roc_sndio/target_sox/roc_sndio
Documentation for CLI tools is here: https://roc-streaming.org/toolkit/docs/tools/command_line_tools.html

See also #246 for additional hints.

@gavv gavv added enhancement help wanted An important and awaited task but we have no human resources for it yet easy hacks Solution is expected to be straightforward even if you are new to the project category: sound io Task related to audio I/O labels Sep 29, 2023
@gavv
Copy link
Member Author

gavv commented Sep 29, 2023

This is pseudo-code that updates WAV header. It can be invoked each time after we write more samples to file, and at the very beginning when opening file.

#define WAVE_FORMAT_PCM 1
#define WAVE_FORMAT_IEEE_FLOAT 3

struct WavHeader
{
    uint32_t chunk_id;
    uint32_t chunk_size;
    uint32_t format;
    uint32_t subchunk1_id;
    uint32_t subchunk1_size;
    uint16_t audio_format;
    uint16_t num_channels;
    uint32_t sample_rate;
    uint32_t byte_rate;
    uint16_t block_align;
    uint16_t bits_per_sample;
    uint32_t subchunk2_id;
    uint32_t subchunk2_size;
};

void update_header(FILE* file, size_t num_samples, size_t num_channels, size_t sample_rate)
{
    const uint32_t data_len = uint32_t(num_samples * num_channels) * sizeof(sample_t);

    WavHeader hdr = {};

    memcpy(&hdr.chunk_id, "RIFF", 4);
    hdr.chunk_size = to_little_endian_32(data_len + 36);
    memcpy(&hdr.format, "WAVE", 4);
    memcpy(&hdr.subchunk1_id, "fmt ", 4);
    hdr.subchunk1_size = to_little_endian_32(16);
    hdr.audio_format = to_little_endian_16(WAVE_FORMAT_PCM);
    hdr.num_channels = to_little_endian_16((uint16_t)num_channels);
    hdr.sample_rate = to_little_endian_32((uint32_t)sample_rate));
    hdr.byte_rate =
        to_little_endian_32((uint32_t)(sample_rate * num_channels) * sizeof(sample_t));
    hdr.block_align = to_little_endian_16((uint16_t)num_channels * sizeof(sample_t));
    hdr.bits_per_sample = to_little_endian_16(8 * sizeof(sample_t));
    memcpy(&hdr.subchunk2_id, "data", 4);
    hdr.subchunk2_size = to_little_endian_32(data_len);

    if (fseek(file, 0, SEEK_SET) != 0) {
        /* error */
    }

    if (fwrite(&hdr, sizeof(hdr), 1, file) != 1) {
        /* error */
    }

    if (fseek(file, 0, SEEK_END) != 0) {
        /* error */
    }
}

@Pekureda
Copy link
Contributor

Hello, I'd like to try my strengths by resolving this issue, of course if it's still available for assignment.

@gavv
Copy link
Member Author

gavv commented Nov 12, 2023

You're welcome, thank you!

@Pekureda
Copy link
Contributor

I have a question regarding a couple of things I've had issues deciding on what should be done:

  1. Since sample rate and channel count is to be ignored, should a log be generated if these fields are set to some values, which could be considered as not 'default'?

  2. I wonder about this kind of log

roc_log(LogInfo,
            "wav source:"
            " in_bits=%lu out_bits=%lu in_rate=%lu out_rate=%lu"
            " in_ch=%lu out_ch=%lu",
            (unsigned long)wav_.bitsPerSample, (unsigned long)wav_.bitsPerSample,
            (unsigned long)wav_.sampleRate, (unsigned long)wav_.sampleRate,
            (unsigned long)wav_.channels, (unsigned long)wav_.channels);

vs

roc_log(LogInfo,
            "sox source:"
            " in_bits=%lu out_bits=%lu in_rate=%lu out_rate=%lu"
            " in_ch=%lu out_ch=%lu is_file=%d",
            (unsigned long)input_->encoding.bits_per_sample,
            (unsigned long)in_signal_.precision, (unsigned long)input_->signal.rate,
            (unsigned long)in_signal_.rate, (unsigned long)input_->signal.channels,
            (unsigned long)in_signal_.channels, (int)is_file_);

I assume that previously the config defined also output parameters for conversion. I'm not sure about the conversion with regard to bits per sample. It was not mentioned but I'd want to make sure that this conversion should be a possibility.

  1. I'm not sure how to handle channel changes. Because channel data seems to be based on ChannelSet class. dr_wav only contains information about number of channels, so appropriate modification of ChannelSet seems to be rather problematic. This may be rather important to access method ns_2_samples_overall to calculate buffer size from frame length, without cloning the underlying code.

I'm not sure if there's an issue made for 3rd party libs sources, but I've had to change slightly the script to be able to perform a complete build, namely for alsa, ltdl and gengetopt I've had to change their source addresses from ftp to https.

If you'd like to follow the development, hopefully this week it will be major, please follow the fork fork. It may make some of my points clearer if looked into WIP commits. Cosmetic issues will be resolved after 'WIP stage'.

@gavv
Copy link
Member Author

gavv commented Nov 23, 2023

Hi! Good catches.


I think for wav source, there are only two valid uses of --rate option (which fills config.sample_spec.sample_rate) - it should be either omitted or equal to file's rate.

If user provided --rate and it differs from file's rate, I think we should print error and fail to open, to let the user know that request is incorrect. Something like: "requested input rate X, but input file has rate Y".

Note that for wav sink, the behavior is different: if --rate is specified, we should use it: write to wav header, and also return via sample_spec() method.


AFAIK, WAV format does not specify semantics & order of channels, it just has channel count, and intepretation of channels is up to the user. However, I think common convention is to use SMPTE order (example).

Roc supports two channel layouts (ChannelSet.layout):

  • surround (when each channel has its meaning, e.g. front-left, front-right, etc)
  • multitrack (when channels don't have meaning)

In surround layout, different orders are possible (ChannelSet.order), and by default roc uses SMPTE order.

By default, we can assume that WAV file has surround layout with SMPTE order. We should set corresponding values to sample spec. And if WAV file has N channels, we can just enable first N bits of channel set (ChannelSet.set_channel_range).

We also need to allow the user to explicitly specify what channels are stored in the WAV file to override default behavior. I think we can do it later. In next release (I hope) we'll replace --rate with more generic --io-encoding that defines rate, channels, and format. After adding it, we can use it for wav backend too.


I'm not sure if there's an issue made for 3rd party libs sources, but I've had to change slightly the script to be able to perform a complete build, namely for alsa, ltdl and gengetopt I've had to change their source addresses from ftp to https.

Not sure why it happens, what OS are you using?

Anyway, the change makes sense, feel free to create a PR with that.


If you'd like to follow the development, hopefully this week it will be major, please follow the fork fork. It may make some of my points clearer if looked into WIP commits. Cosmetic issues will be resolved after 'WIP stage'.

A few notes:

  • since wav backend doesn't require any external dependencies, there is no need to make it optional (--disable-wav, target_wav); let's just make it always available

  • for wav backend, pause/resume should be no-op (except updating state); these operations are called when we temporary stop using backend, however it needs special handling only for devices, but not file

  • thought, restart still needs special handling: we should either seek to beginning (if dr_wav supports it) or reopen file

  • nitpick regarding naming conventions: we use snake_case for variables & members; classes are named like "WavBackend" instead of "WAVBackend"

@Pekureda
Copy link
Contributor

Hi, thanks for your insights! Regarding the points:

I'm not sure if there's an issue made for 3rd party libs sources, but I've had to change slightly the script to be able to perform a complete build, namely for alsa, ltdl and gengetopt I've had to change their source addresses from ftp to https.

Not sure why it happens, what OS are you using?

Anyway, the change makes sense, feel free to create a PR with that.

I'm using Ubuntu 22.04.3 LTS running on WSL2, though in this case it was a problem with my Internet connection as I'm mostly operating on a network with blocked FTP ports. I should've noticed that, so I'm sorry for the confusion. I've verified the links on my mobile connection and they work perfectly fine, so I'll reverse the change as it's not a part of this issue, unless requested otherwise.


thought, restart still needs special handling: we should either seek to beginning (if dr_wav supports it) or reopen file

It should be achievable with drwav_seek_to_pcm_frame function.

@gavv
Copy link
Member Author

gavv commented Nov 24, 2023

I'm using Ubuntu 22.04.3 LTS running on WSL2, though in this case it was a problem with my Internet connection as I'm mostly operating on a network with blocked FTP ports. I should've noticed that, so I'm sorry for the confusion. I've verified the links on my mobile connection and they work perfectly fine, so I'll reverse the change as it's not a part of this issue, unless requested otherwise.

I see. I don't know about you case, but quick googling suggests that ISP providers sometimes limit FTP ports, so apparently your changes can be helpful. (and also ftp is insecure)

@Pekureda
Copy link
Contributor

Pekureda commented Nov 28, 2023

Hi, I have another couple of questions:

I'm struggling with endian conversion for the purposes of writing the WAV data. For now I'm using endian.h header, but it's platform dependent and includes for appropriate functions for supported platforms most likely will need to be wrapped in preprocessor macros. For future implementations, I believe, considering the usage of std::endian from bit header might be much easier and will take the responsibility from us to detect the endianess but I'm aware that there's a strong inclination to not use standard library. Said feature is available since C++20 standard.


Should we implement any conversion between sample sizes, ie. bits per sample from 8b to 16/24/32b etc. or down from 16b to 8b. If a parameter of bits per sample is not divisible by 8 starting from a value of 8, should it be validated and rejected? Panic or error?


The type of audio::Frame is a float. This unfortunately implies that a sample size is 32b. How does it translate if actual bits per sample is 8/16/64 etc.? Is the space just not used or the sample spans over the capacity of the float, (for example 2 floats). How samples of smaller size than 32b are stored, they just don't use all of the space of a float? Also I don't see an option from Config to get how many bits per sample there are. Am I missing something there?


Are there any plans for handling WAV files larger than 2^32-1 bytes?


I think there are typos for sample_spec() methods for SoX classes, because the panic message mention sample_rate() instead of sample_spec().


Relevant code is available in the fork.

@gavv
Copy link
Member Author

gavv commented Nov 29, 2023

I'm struggling with endian conversion for the purposes of writing the WAV data. For now I'm using endian.h header, but it's platform dependent and includes for appropriate functions for supported platforms most likely will need to be wrapped in preprocessor macros. For future implementations, I believe, considering the usage of std::endian from bit header might be much easier and will take the responsibility from us to detect the endianess but I'm aware that there's a strong inclination to not use standard library. Said feature is available since C++20 standard.

We could use std::endian if it was available, because we only avoid "STL" part of stdlib (template containers, iterators, etc). However we use C++98 so it's not available. Also we support a lot of (embedded) targets that don't have C++20 compiler yet.

For endian conversions, we already have two headers:

Should we implement any conversion between sample sizes, ie. bits per sample from 8b to 16/24/32b etc. or down from 16b to 8b. If a parameter of bits per sample is not divisible by 8 starting from a value of 8, should it be validated and rejected? Panic or error?

For reading, there is drwav_read_pcm_frames_f32(). IIRC it should automatically convert samples from whatever format is used in file to 32-bit floats, which we need for audio::Frame.

For writing, we can just always produce WAV files with 32-bit floats (format = WAVE_FORMAT_IEEE_FLOAT = 3), and don't support other formats, for now.

The type of audio::Frame is a float. This unfortunately implies that a sample size is 32b. How does it translate if actual bits per sample is 8/16/64 etc.? Is the space just not used or the sample spans over the capacity of the float, (for example 2 floats). How samples of smaller size than 32b are stored, they just don't use all of the space of a float? Also I don't see an option from Config to get how many bits per sample there are. Am I missing something there?

Conversion is done by scaling each sample. E.g. to convert from float32 (in range -1 to +1) to int16 (in range -32768 to 32767), you multiply each float by 32767 and round it to integer.

In case of WAV backend, as I wrote above, conversion should be needed only when reading, and dr_wav should handle it for us. Though, if we would need it, we have audio::PcmMapper class that can convert between different types and endians.

Are there any plans for handling WAV files larger than 2^32-1 bytes?

For 32-bit stereo with 48Khz sample rate, 4 GB is 3 hours of sound, which is not that huge as one could think.

If support for large files does not need extra efforts, it would be nice to have it. But I don't think it's essential.

I think there are typos for sample_spec() methods for SoX classes, because the panic message mention sample_rate() instead of sample_spec().

Yep, indeed, thanks.

@gavv
Copy link
Member Author

gavv commented Nov 29, 2023

FYI: #246 (comment)

@gavv
Copy link
Member Author

gavv commented Nov 29, 2023

Relevant code is available in the fork.

Small notice: since we removed --disable-wav, then target_wav subdirectory is not needed too. Target directories are used only for conditionally enabled code.

@Pekureda
Copy link
Contributor

Pekureda commented Dec 8, 2023

What kind of driver should be pushed if even it should be pushed in discover_drivers method?

void discover_drivers(core::Array<DriverInfo, MaxDrivers>& driver_list)

If target_wav directory is to be removed, where the source files should then reside? Is it up to me to decide or should they be inside the roc_sndio?


Aside from that the only things left for me is to test changes, fix problems if there are any, optimize and clean-up the code. Thanks again for your insights!

@gavv
Copy link
Member Author

gavv commented Dec 8, 2023

  1. Named "wav", with flags indicating that it's for files only and supports sources and sinks.

  2. Yep, it belongs to roc_sndio.

  3. Great!

@gavv gavv added this to the next milestone Jan 30, 2024
gavv added a commit to gavv/roc-toolkit that referenced this issue Jan 30, 2024
@gavv
Copy link
Member Author

gavv commented Jan 30, 2024

Landed, thanks!

@gavv gavv closed this as completed Jan 30, 2024
gavv added a commit to gavv/roc-toolkit that referenced this issue Apr 9, 2024
- make wav backend lower priority than sndio backend, as
  sndio can handle more variation of WAV/WAVEX files

- don't try wav backend if either driver or file extension
  is not "wav"

- use SampleSpec::use_defaults()
gavv added a commit to gavv/roc-toolkit that referenced this issue Apr 9, 2024
- make wav backend lower priority than sndio backend, as
  sndio can handle more variation of WAV/WAVEX files

- don't try wav backend if either driver or file extension
  is not "wav"

- use SampleSpec::use_defaults()
@gavv gavv added this to Roc Toolkit Jul 6, 2024
@gavv gavv moved this to Done in Roc Toolkit Jul 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: sound io Task related to audio I/O easy hacks Solution is expected to be straightforward even if you are new to the project help wanted An important and awaited task but we have no human resources for it yet
Projects
Status: Done
Development

No branches or pull requests

2 participants