Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DirectSound hangs/glitches when using small input host buffers #775

Open
dechamps opened this issue Feb 26, 2023 · 8 comments · May be fixed by #782
Open

DirectSound hangs/glitches when using small input host buffers #775

dechamps opened this issue Feb 26, 2023 · 8 comments · May be fixed by #782
Labels
P3 Priority: Normal src-dsound MS DirectSound Host API /src/hostapi/dsound

Comments

@dechamps
Copy link
Contributor

dechamps commented Feb 26, 2023

When running paloopback -r48000 -s512 with a DirectSound input device:

  • If using --inputLatency 25, the DirectSound host API hangs (as in, the stream callback is never called);
  • If using --inputLatency 30, the stream callbacks follow an irregular cadence and glitches are reported;
  • If using --inputLatency 35, the stream appears to be fine.

This happens regardless of half duplex or full duplex mode. Output seems unaffected.

I've known about this issue since 2018, see dechamps/FlexASIO#29. (I apologize for waiting this long to report it upstream.) Given it was indirectly reported to me by a variety of users over time, I suspect it affects most PortAudio and most Windows versions (at least the modern ones) on most hardware.

The discussion on dechamps/FlexASIO#29 basically sums up the root cause, but just to confirm, here are some results of printf debugging directly on the DirectSound host API code from paloopback with the above parameters:

--inputLatency 100:

StartStream: DSW_StartInput returned = 0x0.
[1191663.76512] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.00000s read 0.00000s size 0.11067s
[1191663.82027] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.00000s read 0.00000s size 0.11067s
[1191663.87735] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.01000s read 0.00000s size 0.11067s
[1191663.93182] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.01000s read 0.00000s size 0.11067s
[1191663.98797] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.02000s read 0.00000s size 0.11067s
[1191664.05155] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.03000s read 0.00000s size 0.11067s
[1191664.11109] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.04000s read 0.00000s size 0.11067s
[1191664.16542] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.04000s read 0.03125s size 0.11067s
INPUT STREAM CALLBACK
INPUT STREAM CALLBACK
[1191664.22242] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.05000s read 0.03125s size 0.11067s
[1191664.27969] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.05000s read 0.03125s size 0.11067s
[1191664.33578] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.06000s read 0.03125s size 0.11067s
[1191664.39007] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.07000s read 0.03125s size 0.11067s
[1191664.44996] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.07000s read 0.06250s size 0.11067s
INPUT STREAM CALLBACK
INPUT STREAM CALLBACK
INPUT STREAM CALLBACK
[1191664.50676] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.08000s read 0.06250s size 0.11067s
[1191664.56127] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.08000s read 0.06250s size 0.11067s
[1191664.61813] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.09000s read 0.06250s size 0.11067s
[1191664.67896] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.10000s read 0.09375s size 0.11067s
INPUT STREAM CALLBACK
INPUT STREAM CALLBACK
INPUT STREAM CALLBACK
[1191664.73565] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.10000s read 0.09375s size 0.11067s
[1191664.79060] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.11000s read 0.09375s size 0.11067s
[1191664.84504] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.00933s read 0.09375s size 0.11067s
[1191664.90011] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.00933s read 0.00000s size 0.11067s
INPUT STREAM CALLBACK
INPUT STREAM CALLBACK
[1191664.96165] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.01933s read 0.00000s size 0.11067s
[1191665.01595] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.01933s read 0.00000s size 0.11067s
[1191665.07057] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.02933s read 0.00000s size 0.11067s
[1191665.12475] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.03933s read 0.03125s size 0.11067s
INPUT STREAM CALLBACK
INPUT STREAM CALLBACK
INPUT STREAM CALLBACK
…

--inputLatency 25:

StartStream: DSW_StartInput returned = 0x0.
[3700088.75965] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.00000s read 0.00000s size 0.03567s
[3700088.80787] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.00000s read 0.00000s size 0.03567s
[3700088.87961] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.00000s read 0.00000s size 0.03567s
[3700088.94204] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.00000s read 0.00000s size 0.03567s
[3700089.00590] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.01000s read 0.00000s size 0.03567s
[3700089.07005] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.01000s read 0.00000s size 0.03567s
[3700089.14027] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.01000s read 0.00000s size 0.03567s
[3700089.20353] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.01000s read 0.00000s size 0.03567s
[3700089.26684] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.02000s read 0.00000s size 0.03567s
[3700089.33014] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.02000s read 0.00000s size 0.03567s
[3700089.39321] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.02000s read 0.00000s size 0.03567s
[3700089.46385] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.02000s read 0.00000s size 0.03567s
[3700089.52766] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.02000s read 0.00000s size 0.03567s
[3700089.58995] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.02000s read 0.00000s size 0.03567s
[3700089.65343] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.03000s read 0.00000s size 0.03567s
[3700089.71732] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.03000s read 0.00000s size 0.03567s
[3700089.78055] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.03000s read 0.00000s size 0.03567s
[3700089.84376] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.03000s read 0.00000s size 0.03567s
[3700089.90704] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.00000s read 0.00000s size 0.03567s
[3700089.97158] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.00000s read 0.00000s size 0.03567s
[3700090.03457] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.00000s read 0.00000s size 0.03567s
[3700090.09786] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.00000s read 0.00000s size 0.03567s
[3700090.16080] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.01000s read 0.00000s size 0.03567s
[3700090.23129] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.01000s read 0.00000s size 0.03567s
[3700090.30177] IDirectSoundCaptureBuffer_GetCurrentPosition() = 0 capture 0.01000s read 0.00000s size 0.03567s
…

This confirms the root cause from dechamps/FlexASIO#29 - DirectSound seems to be incapable of providing better than 31.25 ms input read cursor granularity (note this number stays constant regardless of sample rate). I found another DirectSound user complaining about the same problem in 2011. (Note that this only affects capture - playback cursors appear to be significantly more granular in my testing.)

As a result, if the host input buffer is less than 31.25 ms in size, the read cursor stays stuck at 0, the PortAudio DS host API code never makes forward progress, and the stream callback is never called.

Everything points at this being a limitation/bug of DirectSound itself. However, it seems like PortAudio could at least work around it by provisioning large enough buffers, so that PortAudio users don't have to "tiptoe around" buffer size minefields.

Somewhat related to this, I am not quite sure I understand why suggestedLatency has any effect on the DirectSound input host buffer size in the first place. Given DirectSound uses circular buffers with cursors, my understanding is that the buffer size has no effect on input latency - PortAudio simply "chases the cursor" and will process input data as soon as the cursor moves, without waiting for the input host buffer to fill up (which it shouldn't anyway, as it would overflow). It seems like suggestedLatency should not be a factor in the input host buffer size calculation; instead, the DS input host buffer size should be a tradeoff between memory usage and likelihood of input buffer overflow. Corollary: PortAudio can set the host input buffer size to be as large as necessary to avoid the problem described above without affecting input latency at all.

What really drives the input latency here is read cursor granularity. Unfortunately, given DirectSound seems stuck with an implicit, undocumented 31.25 ms granularity with no way to way to adjust it (that I could find), it looks like that would squash any hope of achieving anything resembling low latency for DirectSound input devices.

@dechamps
Copy link
Contributor Author

Somewhat related to this, I am not quite sure I understand why suggestedLatency has any effect on the DirectSound input host buffer size in the first place. Given DirectSound uses circular buffers with cursors, my understanding is that the buffer size has no effect on input latency - PortAudio simply "chases the cursor" and will process input data as soon as the cursor moves, without waiting for the input host buffer to fill up (which it shouldn't anyway, as it would overflow). It seems like suggestedLatency should not be a factor in the input host buffer size calculation; instead, the DS input host buffer size should be a tradeoff between memory usage and likelihood of input buffer overflow.

Looking a bit closer at the code, I think I understand now why the host input buffer size is affected by suggestedLatency: it's because suggestedLatency determines the polling rate (which makes sense), and the buffer needs to be appropriately sized to ensure that it does not overflow between polls.

*hostBufferSizeFrames = userFramesPerBuffer
+ max( userFramesPerBuffer + pollingJitterFrames, targetBufferingLatencyFrames);
*pollingPeriodFrames = max( max(1, userFramesPerBuffer / 4), targetBufferingLatencyFrames / 16 );

This does still suggest that allocating a host input buffer that is larger than necessary should be benign (the only downside is increased memory usage), so we might want to impose a clamp on the input buffer size to account for the 31.25ms read cursor granularity.

@RossBencina
Copy link
Collaborator

Hi Etienne,

I really appreciate your engagement with improving PortAudio. However I'm having a hard time digesting this long bug report, it could benefit from more focus.

Does your last comment basically contradict the original report? Maybe you update the original report to reflect your current view.

Would it be possible for you to simplify and clarify the report, perhaps by breaking it up under the following headings:

  • brief summary description of problem
  • test results demonstrating the problem
  • actual identified problem cause (if identified)
  • other speculations about the cause of the problem
  • [optional] background, history, opinions, etc.

@RossBencina RossBencina added src-dsound MS DirectSound Host API /src/hostapi/dsound P3 Priority: Normal labels Feb 27, 2023
@dechamps
Copy link
Contributor Author

Sure.

Does your last comment basically contradict the original report?

It does not. It only contradicts the quoted section. That was just me thinking out loud about potential fixes. I apologize for the rambling.

brief summary description of problem

I think this is covered by the first part of my report so I'll just repeat it here:

When running paloopback -r48000 -s512 with a DirectSound input device:

  • If using --inputLatency 25, the DirectSound host API hangs (as in, the stream callback is never called);
  • If using --inputLatency 30, the stream callbacks follow an irregular cadence and glitches are reported;
  • If using --inputLatency 35, the stream appears to be fine.

This happens regardless of half duplex or full duplex mode. Output seems unaffected.

test results demonstrating the problem

One way to reproduce is to run paloopback -r48000 -s512 --inputLatency 25 with any DirectSound input device (output doesn't matter). That will produce something like this:

   Default suggested input latency (msec): low = 120.00, high = 240.00
   Default suggested output latency (msec): low = 90.00, high = 180.00
   Running with suggested latency (msec): input = 25.00, out = 240.00
ERROR - stream completion timed out!PortAudio error = Wait timed out
C:\Users\edechamps\Documents\portaudio\qa\loopback\src\audio_analyzer.c:310 - ERROR - startFrame out of bounds
C:\Users\edechamps\Documents\portaudio\qa\loopback\src\audio_analyzer.c:310 - ERROR - startFrame out of bounds
   Amplitudes: left = -1.000000, right = -1.000000
C:\Users\edechamps\Documents\portaudio\qa\loopback\src\audio_analyzer.c:310 - ERROR - startFrame out of bounds
C:\Users\edechamps\Documents\portaudio\qa\loopback\src\audio_analyzer.c:310 - ERROR - startFrame out of bounds
C:\Users\edechamps\Documents\portaudio\qa\loopback\src\paqa.c:1348 - ERROR - No good loopback cable found.
PortAudio QA FAILED! 8908 tests passed, 5 tests failed

C:\Users\edechamps\Documents\portaudio\out\build\x64-Debug\qa\loopback\paloopback.exe (process 30872) exited with code 1.

The "timed out" errors are the cues here - the stream callback never fires. The rest is just paloopback getting confused as a result.

actual identified problem cause (if identified)

DirectSound only provides 31.25 ms read cursor granularity on capture buffers, regardless of the size of the capture buffer, regardless of sample rate, and (seemingly) regardless of hardware or software configuration. This appears to be a bug/limitation in DirectSound itself.

Because of this limitation, if PortAudio allocates a DS capture buffer whose length is less than 31.25 ms, the DS read cursor is stuck at 0 and never makes progress. This in turn means the stream callback never fires.

[optional] background, history, opinions, etc.

The fix I have in mind is to make PortAudio always allocate a capture buffer that is at least 62.50 ms to work around this DirectSound limitation. This should not affect input latency because the size of the capture buffer does not affect latency (PortAudio will consume input data as soon as the read cursor moves regardless). The only downside is slightly increased memory usage for the capture buffer, but that's obviously a better outcome than the stream not working at all.

@RossBencina
Copy link
Collaborator

Thanks.

DirectSound only provides 31.25 ms read cursor granularity on capture buffers, regardless of the size of the capture buffer, regardless of sample rate, and (seemingly) regardless of hardware or software configuration. This appears to be a bug/limitation in DirectSound itself.

Which Windows versions have you tested this on, and/or what is the source of this information, and/or to which Windows versions do you think that this applies to?

@dechamps
Copy link
Contributor Author

dechamps commented Feb 28, 2023

Which Windows versions have you tested this on, and/or what is the source of this information, and/or to which Windows versions do you think that this applies to?

My original investigation in dechamps/FlexASIO#29 dates back to 2018, I must have been running on Windows 10 back then. I can still reproduce it today on Windows 11 22H2. I was not the one who noticed it first - it was a bug report from an user of my app. Since then, more users reported this issue to me over time, see e.g. dechamps/FlexASIO#50, dechamps/FlexASIO#110, dechamps/FlexASIO#140. Presumably I would have received even more reports if I hadn't worked around it in my app to avoid small buffer sizes.

Also keep in mind that a DirectSound user faced the exact same problem all the way back in 2011 on Windows 7, citing the exact same read cursor granularity of 31.25 ms.

Given the above, I would basically assume this issue affects all Windows versions starting from Windows 7 at least.

@dechamps
Copy link
Contributor Author

dechamps commented Mar 5, 2023

Out of curiosity I ran a few tests to see if there was a similar "minimum granularity" for output DS buffers as well. I was able to observe write cursor granularity of 1.25 ms which is basically good enough for all intents and purposes. So it looks like we only need some kind of fix for the input host buffer, and we can leave the existing output host buffer logic alone.

dechamps added a commit to dechamps/portaudio that referenced this issue Mar 6, 2023
This works around a DirectSound limitation where input host buffer sizes
smaller than 31.25 ms are basically unworkable and make PortAudio hang.
The workaround is to impose a minimal buffer size of 2*31.25 ms on
input-only and full-duplex streams. This is enough for the read cursor
to advance twice around the buffer, basically resulting in de facto
double buffering.

This change was tested with `paloopback` under a wide variety of
half/full-duplex, framesPerBuffer, and suggested latency parameters.
(Note the testing was done on top of PortAudio#772 as otherwise paloopback is not
usable.)

Fixes PortAudio#775
@dechamps dechamps linked a pull request Mar 6, 2023 that will close this issue
dechamps added a commit to dechamps/portaudio that referenced this issue Mar 7, 2023
This works around a DirectSound limitation where input host buffer sizes
smaller than 31.25 ms are basically unworkable and make PortAudio hang.
The workaround is to impose a minimal buffer size of 2*31.25 ms on
input-only and full-duplex streams. This is enough for the read cursor
to advance twice around the buffer, basically resulting in de facto
double buffering.

This change was tested with `paloopback` under a wide variety of
half/full-duplex, framesPerBuffer, and suggested latency parameters.
(Note the testing was done on top of PortAudio#772 as otherwise paloopback is not
usable.)

Fixes PortAudio#775
@RossBencina
Copy link
Collaborator

RossBencina commented Mar 21, 2023

I have just conducted a preliminary test on my system using AudioMulch (slightly dated version of PortAudio) on Windows 10 with Realtek HD sound on my ASUS motherboard. 48KHz sample rate. PA callback buffer size = 64 frames. I do observe some kind of stalling, however it appears to be different from what's described above.

The test I did was input+output, monitoring input on headphones.

My settings are expressed as follows: "buffer sizes" in frames: 256, 512, 1024, 2048, 4096, 8192, 16384, 32768. These are converted to suggested latency parameters for PortAudio Pa_OpenStream along the lines of:

inputParameters.suggestedLatency = inputBufferSizeFrames / 48000.0;
outputParameters.suggestedLatency = outputBufferSizeFrames / 48000.0;

(of course I support other sample rates, but it's best to use the native rate for this test).

some results are as follows:

input = 256 (i.e. suggestedLatency = 256/48000.0; approx 5.3ms)
output = 256, 512, 1024 => STALL
output = 2048, 4096, 8192, 16384, 32768 => WORKS FINE

output = 2048
input = 256, 512, 1024, 2048, 4096, 8192, 16384, 32768
=> WORKS FINE

output = 256, 512, 1024
input = 256, 512, 1024 => STALL
input= 2048, 4096, 8192, 16384, 32768 => WORKS FINE

So it seems that if the output latency is configured large enough, a small input latency (~5ms) works fine on my system. (Note here that "input latency" is the suggestedLatency passed to Pa_OpenStream not necessarily the DirectSound buffer size.

From this I conclude that the "31.25 ms input read cursor granularity" conjectured above is only present on some systems. The fact that the output latency setting is changing the input latency behavior suggests to me that something else might be going on.

@dechamps do you observe any similar change of behavior if you try different output latencies while keeping the input latency constant?

@dechamps
Copy link
Contributor Author

So it seems that if the output latency is configured large enough, a small input latency (~5ms) works fine on my system.

This is perfectly unsurprising and does not contradict my report. This is because, in full duplex mode, the PortAudio DS code only uses a single buffer size for both input and output, and that buffer size is calculated based on the largest of input and output latency:

/* maximum of input and adjusted output suggested latency */
if( adjustedSuggestedOutputLatencyFrames > targetBufferingLatencyFrames )
targetBufferingLatencyFrames = adjustedSuggestedOutputLatencyFrames;

Therefore, if you use full duplex mode and a high output latency, what you're doing is hiding the problem.

From this I conclude that the "31.25 ms input read cursor granularity" conjectured above is only present on some systems.

To the contrary - with your experiment you literally reproduced precisely the problem that I originally described!

input = 256, 512, 1024 => STALL
input= 2048, 4096, 8192, 16384, 32768 => WORKS FINE

If you look at CalculateBufferSettings() code, in full duplex mode, for a low enough suggested output latency, a suggested input latency of 1024 frames, 64 frames callback buffer size, your 1024 "STALL" result happens when the calculated host buffer size is ~23 ms, and your 2048 "WORKS FINE" result happens when the calculated buffer size is ~44 ms. This is perfectly consistent with the 31.25 ms read cursor granularity problem from the original report.

I hope that by reproducing the problem yourself you are now convinced that this likely affects most, if not all, systems.

dechamps added a commit to dechamps/portaudio that referenced this issue Mar 22, 2023
This works around a DirectSound limitation where input host buffer sizes
smaller than 31.25 ms are basically unworkable and make PortAudio hang.
The workaround is to impose a minimal buffer size of 2*31.25 ms on
input-only and full-duplex streams. This is enough for the read cursor
to advance twice around the buffer, basically resulting in de facto
double buffering.

This change was tested with `paloopback` under a wide variety of
half/full-duplex, framesPerBuffer, and suggested latency parameters.
(Note the testing was done on top of PortAudio#772 as otherwise paloopback is not
usable.)

Fixes PortAudio#775
dechamps added a commit to dechamps/portaudio that referenced this issue Mar 22, 2023
This works around a DirectSound limitation where input host buffer sizes
smaller than 31.25 ms are basically unworkable and make PortAudio hang.
The workaround is to impose a minimal buffer size of 2*31.25 ms on
input-only and full-duplex streams. This is enough for the read cursor
to advance twice around the buffer, basically resulting in de facto
double buffering.

This change was tested with `paloopback` under a wide variety of
half/full-duplex, framesPerBuffer, and suggested latency parameters.
(Note the testing was done on top of PortAudio#772 as otherwise paloopback is not
usable.)

Fixes PortAudio#775
dechamps added a commit to dechamps/portaudio that referenced this issue May 25, 2024
This works around a DirectSound limitation where input host buffer sizes
smaller than 31.25 ms are basically unworkable and make PortAudio hang.
The workaround is to impose a minimal buffer size of 2*31.25 ms on
input-only and full-duplex streams. This is enough for the read cursor
to advance twice around the buffer, basically resulting in de facto
double buffering.

This change was tested with `paloopback` under a wide variety of
half/full-duplex, framesPerBuffer, and suggested latency parameters.
(Note the testing was done on top of PortAudio#772 as otherwise paloopback is not
usable.)

Fixes PortAudio#775
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 Priority: Normal src-dsound MS DirectSound Host API /src/hostapi/dsound
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants