Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hackrf_transfer -c produces possible buffer underruns #1503

Open
Sasszem opened this issue Nov 10, 2024 · 4 comments
Open

hackrf_transfer -c produces possible buffer underruns #1503

Sasszem opened this issue Nov 10, 2024 · 4 comments

Comments

@Sasszem
Copy link
Contributor

Sasszem commented Nov 10, 2024

What type of issue is this?

permanent - occurring repeatedly

What issue are you facing?

While measuring the output of a clone HackRF one with a proper spectrum analyser, I noticed the power output fluctuating, and with slower sweep times, the spectrum looked exactly like an OOK spectrum.
I confirmed the result using time-domain measurements with a low-frequency carrier - the output went to 0 for extended periods (milliseconds) somewhat randomly.
I suspected software error, and wrote a small piece of C++ code on my own to transmit. My code did NOT produce the same effect, confirming a software bug in or triggered by hackrf_transfer.
I am planning to investigate the real cause, but in the meantime, I'm opening this issue in case anyone else encountered the same effect.

What are the steps to reproduce this?

  • run hackrf_transfer -f 10000000 -a 0 -c 127 -x 47 (10 MHz output with maximum gain settings without amplifier)
  • measure with spectrum analyser, 10Meg center, 2Meg SPAN and <1khz RBW. You should (or rather should not) see 1/x type OOK spectrum. WARNING: set your reference level high enough or use an external attenuator so you will not damage your instrument!
  • measure with oscilloscope, time scale SLOWER so you can see amplitude variations in the millisecond scale, like 1ms / div

Can you provide any logs? (output, errors, etc.)

No response

@martinling
Copy link
Member

The host is probably failing to keep up with supplying samples to the HackRF in time.

Check the statistics from hackrf_debug -S after running hackrf_transfer. If there is a non-zero number of shortfalls reported, this is the issue.

With the -c mode, hackrf_transfer fills its buffers with constant samples generated on the fly, but the process of supplying samples to the HackRF is otherwise the same as when reading sample data from a file.

@Sasszem
Copy link
Contributor Author

Sasszem commented Nov 13, 2024

Software versions

Previous firmware version reported by hackrf_info was Firmware Version: 2022.09.1 (API:1.06).
Updated both library and firmware to latest version, now hackrf_info reports

libhackrf version: git-17f39433 (0.9)
Firmware Version: git-17f39433 (API:1.08)

The problem persists even after the update.

I am running Pop!_OS 22.04 LTS with 6.9.3-76060903-generic kernel.

Testing script

I made a short script for testing:

#!/bin/bash
printf "Resetting device...\n"
hackrf_spiflash -R
sleep 1
hackrf_transfer -f 10000000 -a 0 -c 0 -x 0 -n 10000000
hackrf_debug -S

Typical output with this script:

M0 state:
Requested mode: 0 (IDLE) [complete]
Active mode: 0 (IDLE)
M0 count: 19759104 bytes
M4 count: 19759104 bytes
Number of shortfalls: 47
Longest shortfall: 15168 bytes
Shortfall limit: 0 bytes
Mode change threshold: 0 bytes
Next mode: 0 (IDLE)
Error: 0 (NONE)

15168 bytes = 7584 samples.
With default sample rate of 10Msps it is 0.7584ms, in the ballpark observed on the scope.

Root cause debug

There are quite a few differences in hackrf_transfer and my provided script.
My script was built using g++ with

CPPFLAGS = -O2 -lhackrf -lm -g -Wall -std=c++20

I checked if -O2 made any difference, but even with -O0 my code did not produce any underruns.

The main difference of the codes is the sample rate, 2M in case of mine and 10M default in hackrf_transfer.

Cross-checking revealed this to be the main problem (was not suprising), hackrf_transfer works well with 2Msps and mine fails with 10M the same way.

Fixing - no idea how

I did not find any tips on how to help the host keep up. Both codebases are just as optimized as they can be, so I think some settings in the OS or in libusb need to be changed, but I have no idea on that.

I recommend adding a warning display in the output of hackrf_transfer when buffer overruns occur.

@martinling
Copy link
Member

The problem is not with the speed of your code but with how steadily the host's USB stack pushes data to the HackRF. There is limited buffer space on the device, and it runs out quickly if the flow of data from the host is interrupted for a while.

The main thing you can optimize here is to have the HackRF on its own USB bus. If there are other devices on the bus then the data flow to the HackRF will inevitably be interrupted whilst the host is servicing them.

There isn't really anything to tweak in libusb here. We're already queuing up multiple asynchronous transfers to get the best throughput. And I'm not aware of anything to tweak on the OS side, but we have had someone reporting shortfall problems on Linux even with a dedicated bus, which never used to happen, so it might be that something has changed on the kernel side that's relevant here.

I recommend adding a warning display in the output of hackrf_transfer when buffer overruns occur.

Yeah, this is a good idea. IIRC, it wasn't done when I first added the shortfall stats because older firmware wouldn't support the request. But we should probably go ahead and do it now.

I also have #1484 open, which warns in hackrf_info if other devices are sharing the bus.

@Sasszem
Copy link
Contributor Author

Sasszem commented Nov 30, 2024

The problem is not with the speed of your code but with how steadily the host's USB stack pushes data to the HackRF. There is limited buffer space on the device, and it runs out quickly if the flow of data from the host is interrupted for a while.

I think this deserves an entry in the FAQ, as it can be a potential problem in other applications as well.

The main thing you can optimize here is to have the HackRF on its own USB bus. If there are other devices on the bus then the data flow to the HackRF will inevitably be interrupted whilst the host is servicing them.

I have a hard time debugging this, but it seems like (according to lsusb -tv) in my case all the exposed USB ports on my laptop are on the same bus, and even using an USB-C hub it ends up on that bus. Strangely enough, the external HUBs internal Ethernet adapter ends up on a different bus, so either something is reported wrong or the USB-C connector passes through multiple busses at the same time.

There isn't really anything to tweak in libusb here. We're already queuing up multiple asynchronous transfers to get the best throughput. And I'm not aware of anything to tweak on the OS side, but we have had someone reporting shortfall problems on Linux even with a dedicated bus, which never used to happen, so it might be that something has changed on the kernel side that's relevant here.

The correct solution would be to offload processing to the MCU, but that would be an entire redesign, so impossible now.
Lowering the sample rate where possible is a good solution, but at the same time, even 2 MSps seems to be too low in some situations.

Yeah, this is a good idea. IIRC, it wasn't done when I first added the shortfall stats because older firmware wouldn't support the request. But we should probably go ahead and do it now.

The only problem I could see with it is other software (GNURadio, SDR receiving software, etc.) possibly resetting the MCU thus clearing the stats.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants