-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pulsed audio at end of over #778
Comments
I'm able to duplicate it here, so I created #780 as a place to put code changes to resolve this (and other EOO related bugs). |
@drowe67, actually, this is what I'm noticing now: (EOO is louder than the rest of the TX to make it easier to see.) I produced this by doing the following in freedv-gui:
I thought it was a freedv-gui issue initially but I'm also seeing zeros when executing the following:
Are those zeros intentional (i.e. they won't be zeros once the EOO data's implemented) or some sort of RADE bug? BTW it looks like EOO is supposed to be 144ms long if I'm understanding the log output I have correctly:
(960 samples = 20ms at 48000 Hz) |
@tmiw - what you have above looks correct, it is a sequence of PE000E symbols (P=pilots,E=EOE symbol). When we insert the EOO data symbols, it will become PEDDDP, ie non-zero data symbols. Yes, the EOO frames are 24*6=144ms, a regular frame 120ms (PDDDD). The Rx looks for the two E symbols to trigger an "end of over" event, and hopefully a clean detection of the end of over. The sample from #36 has a sequence of repeating pulsed audio that lasts much longer - try listening to it. Here's a zoom in: Maybe it was a one off, and can't easily be repeated? |
OK, so:
|
It's being sent once now. There was an issue with the code before where it was being sent multiple times which is now fixed in the mentioned PR.
The number of samples here is after being upsampled to the radio audio device. On my system, that device is using a 48000 Hz sample rate, so there should be 1152 * (48000/8000) = EDIT: I can't math. We are in fact injecting the correct number of EOO samples (added additional logging to confirm). |
I suspect this is the repeating EOO bug I mentioned earlier as it looks like there's ~140-150ms of audio from gap to gap as shown below: If so, we should be good. There are a few more issues I've discovered (like EOO seemingly not being transmitted when using the voice keyer) I'd like to resolve before merge. |
For comparison, here's a recording from Test steps are the same as above. Is it fair to say that the issue was freedv-gui transmitting EOO way more times than it was supposed to rather than any sort of audio stuttering? And if so, does the latest in the PR solve the issue well enough to merge? |
Well done on progressing this @tmiw.
Hard for me to say.
The code looks pretty complicated, timers and aysnc events, threads. It's really important the EOO Tx samples are directly appended to the previous Tx samples (no gaps). How do you make sure Tx shuts down immediately after the EOO event? For example - no point having a EOO frame and then a large delay before Tx stops, it's meant to facilitate fast turn around. I'm seeing 500ms of extra Tx audio in the plot above? Couple of tests to try:
|
I have another suggestion for the documentation: if possible, use separate virtual audio devices for RX and TX (if you're using an SDR). The reason I say this is because my copy of freedv-gui (and SmartSDR for my Flex 6300) was set up to use a single "VB-Cable" device, which definitely causes problems with EOO: versus using separate virtual devices: This may depend on the virtual "audio cable" being used, though (I know there are several available on macOS and Windows, for instance).
I just played the audio file in that last screenshot through freedv-gui and I didn't hear any robot noises at the end of it, FWIW.
I can tell if EOO is working if the output goes from sync immediately to candidate/search (without increasing
|
Re 3. yes that looks about right. You can also see |
Unsure exactly where to report this, and it may not matter. Running the v2.0-dev branch with git ID 7d88772 which has #780 merged and I am seeing occasional bursts of this message: 19:59:42 WARN /home/bdm/git/freedv-gui-v2.0-dev/src/main.cpp:3158: RX FIFO full Anything from 6 times in a row to well over 40 times. Doesn't seem to be much of problem, but perhaps points to something not draining the FIFO. |
Was that message not occurring prior to merge? |
I don't remember seeing it, the difficulty is that it would have been interspersed with all the RADE stats output so not as easy to spot it. I think I would have seen blocks of identical lines of text, maybe not with 6 in a row but I would have seen 40+ of them. I have tried using the previous version with git ID ce76e7f and I don't see any of these FIFO warnings, so it looks like something changed in the audio capture code. I have now restarted the latest git ID version and I am not now seeing the same FIFO warnings, I don't have an explanation for this yet, but will leave it running and see if the console has any more such messages in the morning. Just seen 9 FIFO warnings in a row, all within the same second. Leaving it to run overnight now. |
I'm still seeing these warnings, but so far I can't say that they are causing a problem in decoding RADEV1. It's not easy to correlate what I can hear with what I see in the console window. I am currently using the v2.0-dev branch with the ms-rade-text code merged, so it should be the most up to date, it also behaves the same way as v2.0-dev without ms-rade-text. |
The fact that it's outputting that message at all means it's dropping audio. It might not cause problems with RADE if it happens infrequently enough but we should probably still try to avoid it when possible. Anyway, you can try playing with the following line in main.cpp and see if that helps: int soundCard1InFifoSizeSamples = wxGetApp().appConfiguration.audioConfiguration.soundCard1In.sampleRate; (around line 3078 or so) Maybe multiplying that by some value will eliminate the warnings altogether? i.e. int soundCard1InFifoSizeSamples = wxGetApp().appConfiguration.audioConfiguration.soundCard1In.sampleRate * 2; |
Test underway, I will report back once I get a decent amount of test time. Doubling the size of the FIFO has not stopped the warnings. I will try with x4. Just seen a burst of 5 warnings with x4, will keep testing and then if it looks like it happens regularly I will try x8. |
I increased the buffer to x8, it's now been running for at least 15 hours and I have seen no Rx FIFO warnings. It does seem a really big increase in buffer size, so maybe there is something else going on behind the scenes. |
What sort of latency does this large buffering imply? We need that to be as low as possible. I'm not sure whats possible of a generic based PC sound system without buffer slips. |
|
I have not tried any other modes, but I certainly didn't see this in RADE in this version with the git ID shown FreeDV GUI 2.0.0-devel-ce76e7fb. It's new for me, and seems to have come along when I started using the version with the EOO Tx stuttering fix applied, PR #780 The FIFO in the Options->Debugging tab is showing 440ms. The sample rate in .freedv is 48000 for all the interfaces, so there is a 1 second long buffer normally according to the code shown above. Are we sure that the FIFO warning means what we think? Even at 1 second it's way beyond what you would want in a full duplex system. |
I don't see anything immediately obvious diffing between the latest in the PR and ce76e7f, unfortunately. Try changing line 3187 in main.cpp (where that RX FIFO full message gets printed) to the following: log_warn("RX FIFO full (used = %d, samples attempted to be written = %d)", codec2_fifo_used(cbData->infifo1), size); |
Building a new test version with the FIFO returned to the original size and the new debug output added. I will report back once I have some new FIFO warnings. Well that didn't take long. Several sets of multiple lines like this are output: RX FIFO full (used = 47040, samples attempted to be written = 960) spread over about 13 seconds, with many repetitions. Buffer size is 48,000 in .freedv I notice that the sum of the used and attempted values is exactly 48,000, I would have thought that an overrun would need the used + attempted write to exceed the buffer size. I wonder if this is a single instance of an overrun that for some reason is not cleared and so repeats over and over. I have no idea how to determine that. I am seeing a few more bursts of these warnings, including one of 3 repetitions which is the shortest I can ever recall seeing. |
Maybe you can see how quickly the RX FIFO is being drained? Add this to line 750 of TxRxThread.cpp: log_info("RX FIFO remaining: %d", codec2_fifo_used(cbData->infifo1)); |
Here is 30 seconds or so worth of the output from this change, naturally there's a lot of it: 09:46:58 INFO /home/bdm/git/freedv-gui-v2.0-ms-rade-text-test/src/pipeline/TxRxThread.cpp:751: RX FIFO remaining: 0 Is this instructive? I'm not sure on a first look. There are a lot of zeros though, is that a clue? |
I left this running, and the result was,eventually, this: Traceback (most recent call last): Not sure what caused it, but it's the first time I have seen this fail since I've been using the RADE-capable versions of freedv-gui. |
Based on the timestamps and the remaining sample counts above, I think your system is in fact keeping up with the incoming audio. I wonder where the other 40,000 samples are coming from, then. (Or maybe adding the logging changed the timing just enough to make it work properly?) Anyway, maybe I can do some more testing locally. For one thing, I'm not sure the FIFO size needs to be that big. |
Not sure that I am clear on the meaning of the debug, surely a value of 0 means that the FIFO has 0 free memory entries? In my book that means it's full. Maybe the timestamps are too coarse to get a feeling for what is really happening. If we are dealing with 20ms frames then a second is a lot of frames. |
Oh, that wording probably should have been different. Since we're calling |
OK, so I think a check that the debug and FIFO warning output is firing on the correct trigger value is needed. The difference between the FIFO data waiting (maximum of 3840, rarely 1920 and more frequently 960) and 48,000 seems unimportant to me, We have a nominal 440ms FIFO in the options, which holds 21,120 samples at 48kHz sampling frequency. So far it seems barely a quarter of that gets used so the size variable being 48,000 makes no sense. Does that make sense? I feel that I'm out of my depth but I don't quite understand why there are 2 different sizes of FIFO in use. |
In theory it shouldn't get anywhere near using all of the FIFO but apparently in your case, it does. Maybe change the debug output for TxRxThread.cpp to the following? if (codec2_fifo_used(cbData->infifo1) > 35000) log_info("RX FIFO used: %d", codec2_fifo_used(cbData->infifo1)); (This should reduce the amount of output substantially in case the previous attempt inadvertently fixed the problem.) You can also try checking and unchecking txRxThreadPriority in Tools->Options->Debugging; that is supposed to govern what priority the TX/RX threads are using (normal vs. highest). |
I changed the debug as suggested, almost immediately I see this: 15:28:32 INFO /home/bdm/git/freedv-gui-v2.0-ms-rade-text-test/src/pipeline/TxRxThread.cpp:752: RX FIFO used: 38400 Maybe these are edge cases where the PC is doing something else, but an 8 core 3.6GHz CPU and matched chipsets should be capable of this load without breaking a sweat I would think. I will try the debug thread priority setting, but this is what the [Debug] section shows in the .freedv file: [Debug] the txRxThreadPriority setting is always checked in the Options->Debugging tab when starting the gui, although I can turn it off but it doesn't seem to be written to .freedv as far as I can tell. |
Since you're getting those messages basically immediately on start with the new debugging, I wonder if doing the following in main.cpp will help:
This should cause the TX/RX threads to start running before pipewire is told to feed audio into FreeDV, so hopefully not a lot gets queued up during the startup process. Let me know how that goes. |
I tried the suggestion above, and it seems to make no difference. All the main.cpp debug outputs show this: 19:55:44 WARN /home/bdm/git/freedv-gui-v2.0-ms-rade-text-test/src/main.cpp:3187: RX FIFO full (used = 47040, samples attempted to be written = 960) whereas the txrxThread.cpp debug shows this sort of thing: 19:55:44 INFO /home/bdm/git/freedv-gui-v2.0-ms-rade-text-test/src/pipeline/TxRxThread.cpp:752: RX FIFO used: 46080 which at least shows the FIFO is draining, apparently in chunks of 960 samples at a time which does equate to 20ms worth. I've just realised the significance of the 47040 with 960 samples to be written, which is increasing the buffer contents and crashing into the buffer end with 48,000 samples available. This happens when freedv is not draining the larger buffer doesn't it? I would have thought that the draining code should be keeping up with the filling code as long as freedv-gui's internal 440ms buffer is neither full nor empty on Rx. I suppose what I'm trying to identify is if there is actually any sample that gets lost because the larger buffer is full at some point. Would that cause ERROR rather than WARN? |
Unfortunately the only way we'd really know if the dropped audio was causing problems is if decode started to suffer. Maybe use
(BTW the code requests a latency of 20ms, which is likely why it gets passed 960 samples.) Anyway, I added logging for every attempted FIFO write and read and from what I can tell, pipewire seems to send a whole bunch of samples at once followed by being quiet for a while (when the RX thread actually processes them). For example:
Granted, this is with loopback devices, but I wouldn't be surprised if it were doing something similar with at least some actual devices too. Unfortunately I don't know enough about pipewire to know whether this is normal behavior or if it can at least be somewhat influenced with API options; removing the current stream flags and replacing with Anyway, one easy option to fix this would be to accept that we need to set the RX FIFO a lot higher than the TX one (for example, 8x sample rate like what you found). We'd need to add comments to the code explaining why, and it may also be a good idea to limit the change to Linux only if possible. Feedback definitely appreciated here. |
I have built a local version with the rade_text.c changes you added overnight, I have also changed the input buffer to * 6 the original size as * 4 reduced the warnings to only an occasional small burst. I will have a look using the pipewire tools as suggested and report anything odd that I see. So far no buffer full warnings with 6 * input buffer. This is the relevant source for the TS-890 PCM2903C audio codec device: Source #79 |
@Tyrbiter, can you revert the FIFO buffer changes and try the changes in #789? This does seem to resolve the RX FIFO full messages but I think it might cause problems with EOO, so I wanted to get confirmation of the former before I investigate further to try to get the latter working consistently again. |
Updated and have it running, so far nothing much to report. The pactl list sources relevant output is this: Source #79 seems to be the same as before. Except that I just this moment saw this error message: Assertion 'bq->blocks == q' failed at ../src/pulsecore/memblockq.c:188, function drop_block(). Aborting. Happened a few minutes after a QSY within the 40m band. And again, no QSY this time, just left freedv-gui running with the modem started: Assertion 'bq->blocks == q' failed at ../src/pulsecore/memblockq.c:188, function drop_block(). Aborting. No FIFO full warnings at all. |
Any way you can run
|
Actually, I was able to duplicate locally and I think I have a fix checked in now. Try again? |
Right, well that seemed to work but after a period of time this happened: Traceback (most recent call last): I will kick off a gdb session and see if I can capture more information. Once the debug info (many GBs worth) has downloaded. And now I have a running freedv-gui under gdb with all the debug info too. |
What git commit ID are you running inside |
Is there a simple way to check that? If it's this: then the code in my local build directory in the build_linux/rade_src directory doesn't have this commit as far as I can tell. I am using a local branch of ms-rade-text, I don't know when it was branched but I copied it from v2.0-dev a while ago. How do I force a new download of the right rade code? I thought this happened automatically, but maybe I'm wrong. |
Yeah, it looks like main hasn't been brought down to |
OK, I could cherry pick that one line patch, and then hope that nothing else falls over. Or just wait for @drowe67 to fix it all up as I don't really need perfection, especially on Christmas Eve. A quick check reveals the rade_src directory git log shows: commit d294bcba4d19cf6c562944f394c556579814579d (HEAD, origin/dr-eoo-data, dr-eoo-data) I don't stand a chance of keeping this straight in my head. |
I'm now running a build with the latest rade code with the dr-eoo-data merged into master I also have a local merge of https://github.com/drowe67/freedv-gui/tree/ms-pulseaudio-divide-samples into this and will keep running it until any update to that branch is needed as mentioned in #778 (comment) So far so good, it isn't outputting anything unexpected. Now updated to the official 9b85492 commit |
Doing more testing, I'm not sure if #778 actually breaks EOO TX. It very well might not given that the ctests still seem to be passing on GH and I can't seem to run the relevant EOO test reliably with my Linux setup on v2.0-dev either. We'll watch it for now, anyway. |
At some point RX stopped working with RADEV1, on stopping the modem I got this, but wasn't running under gdb so no backtrace: Assertion 'pa_atomic_load(&(s)->_ref) >= 1' failed at ../src/pulse/stream.c:1644, function pa_stream_peek(). Aborting. Another locking problem? Also the 2 sec Rx colour when no callsign is received is a bit weird, you get a 1Hz pulsing in the reporter tab and different stations pulse out of phase when receiving the same station. Also seeing the modem sync indicator turning from green to red after receiving for some time, the audio stops even though the signal is still present in the spectrum and waterfall, changing frequency then takes some time until the buffer empties and the spectrum display once again matches what is on the radio's own spectrum display. |
Maybe. I went ahead and reimplemented it another way that should avoid the need for locking the PulseAudio main loop.
Would changing this so that there's no fading (i.e. stay blue for 2 seconds then immediately turn off) work better? EDIT: not sure if I can do this on the web UI. I'll just revert back to 20s there for now.
This might be related to the above. Hopefully the latest commit in the PR helps. |
New version now running, I will report any strange behaviour. For the Rx colour change, I don't know what would be better. This needs a bit more thought, what is happening now is because the continuous reception lasts longer than the short timeout the colour is immediately re-triggered as soon as the fade finishes, then rinse and repeat. Quite psychedelic. |
Going to close this since the remaining PR against this issue has been merged. If something comes up during the test campaign we can always reopen. |
Gaaah! This has just happened today, but might just be down to some updates to F41 and a delayed reboot while I was fixing up a few other things. I will keep an eye on it and add more information if it returns: 13:06:52 WARN /home/bdm/git/freedv-gui-v2.0-dev/src/main.cpp:3196: RX FIFO full |
I've seen it happen a couple of times on my Mac Mini, but I've attributed that to the system being under load right when it happened (which we really can't reasonably protect against). Hopefully that's the case for you as well. |
I suspect it was, some system libraries got updated by dnf and that probably caused pipewire to choke for a couple of seconds. pipewire runs as a user process which makes it more susceptible to being starved of CPU. |
drowe67/radae#36 (comment)
This should be explored and if repeatable/a consistent issue fixed.
The text was updated successfully, but these errors were encountered: