Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent clicks when looping in AudioStreamWAV #83341

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

bs-mwoerner
Copy link
Contributor

Partially fixes #64775 (namely, the WAV part. OGG is addressed by #80452).

AudioStreamWAV supports resampling and variable playback speeds by using sub-sample accuracy in the playback position and linear interpolation between two neighboring samples.

When forward or backward looping is enabled between samples s and e (inclusive) and the playback pointer is between samples e and e + 1, interpolation is done between samples e and e + 1. The next/previous sample to be played after/before e, however, is s, so interpolation should instead be done between e and s to prevent discontinuities/clicks in the signal. For pingpong looping, interpolation in that same case should be done between e and e - 1, because the stream is essentially mirrored at the end.

Ensuring this border condition requires one additional check per output sample. I don't know how performance sensitive we are in the audio thread so just to be sure, I added one more template parameter to the do_resample method so that there's a separate code path for mixing blocks near the end of the loop region. That produces quite the cascade of function calls, but allows skipping the border checks for the vast majority of blocks.

@fire fire requested a review from a team October 14, 2023 16:33
@ellenhp
Copy link
Contributor

ellenhp commented Oct 14, 2023

I don't know how performance sensitive we are in the audio thread

The audio thread isn't really performance constrained in the traditional sense. Outside of godot I've been doing a lot of embedded programming and it's nothing like that. The main thing is we avoid the use of any mutexes and allocation (because most allocators require mutexes). Audio mixing is the most cache-friendly operation I can think of, and branch predictors on the kinds of processors that Godot runs on tend to be pretty good at their jobs, so I wouldn't sweat an additional branch or two if the condition you're branching on is cheap to evaluate.

Do we have any test files for this bug so I could listen before and after?

@bs-mwoerner
Copy link
Contributor Author

bs-mwoerner commented Oct 14, 2023

I wouldn't sweat an additional branch or two if the condition you're branching on is cheap to evaluate.

Then there's the option to remove the check whether the special case handling is neccessary. Might be slightly less performant but would require less code (one less template parameter and thus half the function calls to do_resample).

Do we have any test files for this bug so I could listen before and after?

I used the cooking_S00 example from the initial post in the issue thread and converted it to WAV. That did click for me when looping:
cooking_S00.zip

Edit: "Loop Begin" was set to 396900 (9 Seconds).
Edit 2: I think it may not click if the file sampling rate matches the mixing sampling rate. For me, the mxing sampling rate was 48000 Hz (file is 44100 Hz) and I think for WASAPI that is determined by what is set in the Windows Control Panel, not by the "Mix Rate" in the Godot project settings.

@ellenhp
Copy link
Contributor

ellenhp commented Oct 14, 2023

For this change I'd say that as you're only adding O(1) cache misses/branch mispredicts (per mix, per WAV playback) you should do whatever is least complex and easiest to read and review.

edit: More concretely, if you have a branch in the inner loop that's either almost always taken or almost never taken, we should ignore it from a performance standpoint for this change. If that sounds agreeable let me know when I should take a look at the code. I remember this part of the codebase requiring some amount of focus to review.

@bs-mwoerner
Copy link
Contributor Author

Ah, yes, that makes a lot of sense. I've removed the template parameter, so all that's new now is a new if in the inner loop and two parameters to do_resample that control the border behavior.

I haven't touched the ADPCM path, by the way. As far as I can tell, that doesn't do interpolation, so there shouldn't be anything to fix there.

Would be great if you could take a look at it now.

Copy link
Contributor

@ellenhp ellenhp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this better. Definitely more readable. :) Just have some questions, but I might need to go re-familiarize myself with how wav looping is actually implemented.

break;
case AudioStreamWAV::LOOP_PINGPONG:
// Interpolate the last sample with the second-to-last sample (mirror).
sample_border = MAX(0, sample_limit - 2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks right for handling the playback bouncing off the last sample of the loop (and thank you for thinking about extremely short sound files) but will this do the right thing when the ping pong loop bounces off the start position? I'm also a little bit unclear as to why the forward and backwards cases are handled identically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it took me some time, to get my head around which possible cases we're dealing with here. In the end, we only ever need to worry about the right edge because do_resample will always interpolate between the "previous" sample and the "next" regardless of the direction we're moving in. offset is mapped to samples by discarding the lower bits, so we can be up to almost a whole sample after the (start of) the last sample, but we cannot be before the first sample. I found it somewhat less obvious that we can end up after the (start of) the last sample when walking backwards and wrapping around to the loop end (because the sub-sample part is retained), but that then maps to the same case of having to deal with the loop end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I'm doing a very good job at explaining what I mean 😅, so here's an example: If we have a loop that starts at 0 and has a length of 10, then the loop end is 10 and the last sample that is part of the loop is 9. offset is a fixed-point number but if it were a float, its valid values would be between 0.0 and 9.99... I believe the original code correctly ensures that offset will never leave the [0, 10) range. The problem was that for positions > 9, the interpolation was done between samples 9 and 10, although 10 isn't part of the loop region. On the lower end, however, even the extreme value 0.0 would be correctly interpolated between samples 0 and 1, so there's no issue there. When starting at 3.5 and playing 10 samples backwards, the code would play 4 samples starting from 3.5 (positions 3.5, 2.5, 1.5, 0.5) and then 6 samples starting at 9.5, so the critical case is still in the 9-10 range.

@bs-mwoerner
Copy link
Contributor Author

I might need to go re-familiarize myself with how wav looping is actually implemented.

I had to do quite a bit of familiarization today, too. 😅 There's a lot going on here.

@ellenhp
Copy link
Contributor

ellenhp commented Oct 14, 2023

There's a lot going on here.

"It's just copying one PCM buffer into another, how hard could it be?" -- me, to myself, circa 2020 😆

I'm still getting clicks in the wav file and loop points you provided but in audacity it doesn't seem to be looping at zero crossings, so I'm going to find my own loop points and then re-test.

@ellenhp
Copy link
Contributor

ellenhp commented Oct 14, 2023

I tried 396954 to 3571752 and I'm still getting clicks both before and after these changes. :/

@bs-mwoerner
Copy link
Contributor Author

Being able to do backwards and ping-pong looping is really neat, but it does open a whole new can of worms, that's true. 😄

I tried 396954 to 3571752 and I'm still getting clicks both before and after these changes. :/

Hm, let me check that again. 🤔 In any case, it's correct that there's no zero crossing at 9 seconds, but the waveform there should line up with the end of the stream.

@ellenhp
Copy link
Contributor

ellenhp commented Oct 14, 2023

Hm, let me check that again. 🤔 In any case, it's correct that there's no zero crossing at 9 seconds, but the waveform there should line up with the end of the stream.

That makes sense. And even if you loop at zero crossings I think you can still get sidebands because of abrupt frequency or amplitude changes so that could be what I'm hearing. I'm definitely not a competent sound engineer. Just trying to find a good test case.

@bs-mwoerner
Copy link
Contributor Author

So I don't get a click with the original test case at 396900. I do get a click with your 396954, but I suppose that's expected, because the stream ends on a positive value, so the jump to zero will click (should be the same in other applications). 3571752 is 8 ms before the end of the stream. That loop sounds... interesting. 😄

Can you set a breakpoint around line 285 and check what values are actually used for mixing? Mine are:
base_rate = 48000
srate = 44100
playback_speed_scale = 1.0
loop_begin_fp = 3251404800
length_fp = 29262643200

@ellenhp
Copy link
Contributor

ellenhp commented Oct 14, 2023

Yeah I just picked zero crossings without listening to them because audacity takes exclusive control of my audio device and I was halfway through an album I like that I wanted to go back and finish.

I don't have an environment set up to actually debug Godot, which I should probably fix because I spend a lot of time working on the engine. I fell out of the habit of working with debuggers when I spent a few years working on an app notorious for crashing debuggers. I'm going to shelve this and work on some other stuff because I do work on Godot as a day job and this is my weekend so it feels like I should be doing something else. I'll try and pick it back up soon though.

@bs-mwoerner
Copy link
Contributor Author

Ah that's okay. I was just wondering if maybe we were using different mixing rates or something and if that could explain the difference, so it would have been interesting if the values that actually end up in the method match up. Enjoy your weekend! 🙂

@MJacred
Copy link
Contributor

MJacred commented Oct 17, 2023

Ok, I tested only with 3 WAV files (I'm pretty tired right now) which were clearly popping. They all have loop offset > 0. But I cut off the intro, so I had 6 files to test (looping back to 0 also popped).

With this PR: They're all still popping.
I only tested forward looping.

I was only skimming over your comments, but you seem to get some popping as well?

@bs-mwoerner
Copy link
Contributor Author

Thanks for testing!

I was only skimming over your comments, but you seem to get some popping as well?

I didn't before, but I just tried a few things and I do get a click when I set my system sampling rate to match the file sampling rate! In my mind that should've been unproblematic because this case does not require interpolation and thus should be unaffected by any issues with the interpolation, but clearly that's either not true or this case has a problem of its own. 😅 Thanks for pointing me in the right direction!

@bs-mwoerner bs-mwoerner marked this pull request as draft October 17, 2023 22:03
@bs-mwoerner
Copy link
Contributor Author

Funnily enough, there were indeed two separate issues. The one I fixed in my original commit affects only the resampling case while the other affects almost exclusively the non-resampling case. Specificly, this line:
aux = (limit - offset) / increment + 1
will return one sample too many if the remaining length is a whole multiple of the increment. Quite unlikely with resampling, but guaranteed to always be the case when the playback rate is 1 (file sampling rate matches mixing sampling rate).

I've changed this now to aux = (limit - offset + increment - 1) / increment (for positive increment only), which should give correct results in both cases:

limit offset increment samples to be read aux before aux now
1000 0 700 2 (0 and 700) 2 2
1000 0 1000 1 (0) 2 ⚡ 1
1000 0 500 2 (0 and 500) 3 ⚡ 2

@bs-mwoerner bs-mwoerner marked this pull request as ready for review October 18, 2023 08:05
@YuriSizov YuriSizov modified the milestones: 4.2, 4.3 Nov 13, 2023
@MJacred
Copy link
Contributor

MJacred commented Dec 28, 2023

@bs-mwoerner: sorry for the late testing. But I can say that all 6 test WAV files are now looping smoothly (tested only forward looping; with looping at second 0 and > 0). They still pop in Godot 4.1.1 and 4.2.1 (tested there just for reference). Could you please squash the commits into 1 and rebase on latest on master?

@ellenhp: Ping, could you please review (if you have the time)

…scontinuities/clicks in the waveform when looping.

- Fixed calculation of remaining samples in the loop returning one too many samples when the input sample rate matches the output sample rate (causing the remaining length to be evenly divisible by the increment), removing another possible source of clicks.
@bs-mwoerner
Copy link
Contributor Author

That's great! Thanks for testing. I've squashed all commits into one and rebased on the current master. 👍

@MJacred
Copy link
Contributor

MJacred commented Apr 5, 2024

@bs-mwoerner: I poked @reduz for a review, and he mentioned doing this once on import.
I only took a quick glance at the code, but so far it sounds like the most sensible approach.
What do you think?

@bs-mwoerner
Copy link
Contributor Author

@MJacred I had a look at the importer and, yes, as long as there's no way to play a file with a loop mode different from what was specified during import, I think we can hardcode the correct overflow sample into the imported copy of the stream (with a bit of special-case handling for when the loop ends at the end of the stream).

This would solve the issue with having to handle loop modes when picking the next sample (resampling case) and replace that part of this fix. The other part (non-resampling case), though, would still need to be fixed in the playback, I think. If I remember correctly, that was just a miscalculation when advancing the pointer.

@ellenhp
Copy link
Contributor

ellenhp commented Apr 8, 2024

I think we do support changing loop mode at runtime. At the very least it is possible and I've suggested people do it before to reduce export size in cases where they might otherwise need two copies of the same resource

@MJacred
Copy link
Contributor

MJacred commented Apr 8, 2024

@ellenhp: Thanks!

I can confirm. Just tested in Godot v4.2.1: while changing the loop mode in the editor requires a re-import, and the editor makes the UI widgets insensitive for clicks, you can still change the mode at runtime. Even disabling.

@MJacred
Copy link
Contributor

MJacred commented Apr 8, 2024

@bs-mwoerner: Hm… can we go hybrid with this? No idea if this can work (can also be used as inspiration for a different / more refined solution):

  1. Do it once on the imported copy of the AudioStreamWAV
  2. IF the user changes the loop mode at runtime, we flag the copied AudioStreamWAV as "loop_mode_is_dirty" and then process it the way your PR currently handles
  3. when the AudioStreamWAV is stopped, check if "loop_mode_is_dirty", then do step 1. and set "loop_mode_is_dirty" to false again

@bs-mwoerner
Copy link
Contributor Author

@MJacred Hm, but this would then only work in the editor, since we can't re-import during runtime, can we? With exported builds, we don't even have access to the source media, so we definitely can't reimport anything there. I feel adding considerable code complexitiy for an optimization that doesn't affect release builds isn't really worth it.

However, it looks like the actual WAV data is mutable during runtime, so I suppose we could have a solution that modifies the stream in-memory in accordance with the loop mode. Then we wouldn't need the special case handling during mixing. We would, however, have to make sure that we keep track of these changes, so that we can undo them before a loop setting or the actual data is changed and then modify the stream anew for the new settings afterwards.

I'd like to hear ellenhp's input on this. I remember her saying that with branch prediction, the impact of a single "if" that's almost never taken in the mixing loop is negligible, so I'm not yet convinced that the additional management effort for replacing this with transient stream modifications is a good trade-off (we would add lots of small things that need to be done in a number of places). One example for an edge case, we'd have to consider: It's possible to save the stream to a file while it is being played, so we'd have to make sure that we undo the stream modification on the fly while saving without affecting what the playback sees.

@ellenhp
Copy link
Contributor

ellenhp commented Apr 11, 2024

Yeah, I'm not an expert on any particular microarchitecture, but processors are designed to be able to handle branches like the one in the inner loop here without stalling the pipeline. You could optimize out the arithmetic add/subtract by comparing pos against limit_minus_1 which only needs to be initialized on the stack once per call. At that point I'd imagine the compiler is going to leave everything in a register and you're looking at a single comparison and a conditional jump that will be speculatively executed. It's the kind of thing where I bet it took me longer to write this message than it will take to execute those two instructions across all end-user devices from now until the heat death of the universe.

edit: looked into this and I'm still convinced it's not an issue for any modern CPU, including old android phones and stuff, but I think the heat death comment is pretty far into the realm of hyperbole. 😆

@MJacred
Copy link
Contributor

MJacred commented Apr 14, 2024

@bs-mwoerner

Hm, but this would then only work in the editor, since we can't re-import during runtime, can we? […] However, it looks like the actual WAV data is mutable during runtime

Yeah. The re-import is more like a "save this default config".

We would, however, have to make sure that we keep track of these changes, so that we can undo them before a loop setting or the actual data is changed and then modify the stream anew for the new settings afterwards.

Yes, tracking the changes would be necessary.

so I'm not yet convinced that the additional management effort for replacing this with transient stream modifications is a good trade-off

If I understood everything correctly, then the trade-off should be agreeable, right?

One example for an edge case, we'd have to consider: It's possible to save the stream to a file while it is being played, so we'd have to make sure that we undo the stream modification on the fly while saving without affecting what the playback sees.

Hm… I don't get that. Shouldn't the user desire exactly this behaviour (edge case)?

@bs-mwoerner
Copy link
Contributor Author

If I understood everything correctly, then the trade-off should be agreeable, right?

Personally, I'd go the "handle this during playback" route, because patching the stream in-memory creates more possiblities to forget to do or undo the modifications at the various places that they need to be done and thus introduces more ways for things to go wrong. If it's true that there's no noticeable difference in performance (I didn't do any benchmarking, so that's an assumption for now), then I'd prefer the simpler solution for maintainability reasons.

Just for context: reduz's original suggestion of preprocessing the stream during import doesn't work with loop mode changes during runtime, because this preprocessing is destructive, so something needs to be done during runtime either way. The remaining options are handling looping during playback and modifying the stream in memory. The first one is simpler and has no side effects (the actual data is not touched), the second one saves one if per sample during mixing but requires some maintenance in the public interface functions.

Hm… I don't get that. Shouldn't the user desire exactly this behaviour (edge case)?

Hm, that's actually a good question. The docs for save_to_wav say "Saves the AudioStreamWAV as a WAV file to path" and doesn't state whether the saved file changes depending on the selected loop mode. Currently it always saves the original waveform, though, so that would be a breaking change, strictly speaking. And since the exported file would have a click where there was none in the original, I can imagine some might consider this more of a bug than a feature.

data is a similar case. Its docs just say "Contains the audio data in bytes." Currently, this returns the unaltered data. Changing this might break some very specific, existing use cases.

Thanks for moving this forward, by the way! I had almost forgotten about this issue.

@MJacred
Copy link
Contributor

MJacred commented Apr 16, 2024

Happy to be of help! And a big "thank you" for everyone who pitched in!

It's good that we went through our options. Now we have sth. to report back. I'll poke again in the contributor chat to get an official review on this.

@MJacred
Copy link
Contributor

MJacred commented May 2, 2024

@reduz: Can we get your review on this, please? It would be great to get one more audio fix into 4.3

@KoBeWi KoBeWi modified the milestones: 4.3, 4.4 Aug 3, 2024
@MJacred
Copy link
Contributor

MJacred commented Aug 23, 2024

@bs-mwoerner: I got a response from Juan on this issue (I'm late in relaying this)

not convinced to be honest given its so, so so corner case
Internpolation does just that, it should never go past the end sample, so even if there is a next sample it should not be audible
So to me it sounds like sample not being well done rather than a problem with interpolation

What is your take on this response?
If the PR is not acceptable in its current form, maybe there's an option to perform cleanups on import (just throwing this into the room).

@ellenhp
Copy link
Contributor

ellenhp commented Aug 23, 2024

Loop offsets >0 aren't an edge case, they're a core feature of the engine. The issue is that there's a bug in the interpolation code that shows up at the loop boundary. I understand having some objections to this PR. It adds branches in inner loops. I understand wanting to see profiling data to back it up, or asking for it to be commented better. But if Juan doesn't want the underlying bug to ever be fixed I don't really know what to say.

On a meta level: It's very frustrating that after putting in dozens or sometimes hundreds of developer-hours across all contributors into a fix, ~5 minutes of Juan's time can completely sink it because he either doesn't understand the problem fully or doesn't think it's important enough to review. This is the main reason I completely deprioritized contributing to and maintaining the project. I recognize that I'm rocking the boat by saying this, but it's because I still care about the project, and I want the best for it. Providing this feedback is important.

@bs-mwoerner
Copy link
Contributor Author

What is your take on this response?

I was a bit confused initially, because when Juan says "Interpolation does just that, it should never go past the end sample, so even if there is a next sample it should not be audible", that sounds like an exact description of what this PR does: It stops the interpolation from going past the end sample, which it previously did. After giving this some thought, I now think when Juan says "interpolation", he may refer to overlapping and fading between the end and the start of the loop (whereas we were talking about interpolating between individual samples in the stream to account for sample rate differences or non-standard playback rates).

I ran a new test with the official 4.3 build and as of now, the clicking problem is still there, but if there are plans to implement overlapping loops for WAV streams at some point in the future, then Juan may mean that we shouldn't bother fixing these issues now, because they will likely not be heard through the fade anyway?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Godot 4] Audio pop on looping with loop offset > 0
6 participants