Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About porting to F746 disco board #11

Open
szjdb opened this issue Aug 10, 2021 · 17 comments
Open

About porting to F746 disco board #11

szjdb opened this issue Aug 10, 2021 · 17 comments

Comments

@szjdb
Copy link

szjdb commented Aug 10, 2021

Hi, Karlis,
Sorry to bother you again.

I have sample the radio output at the 8k sample rate for realtime decode, and got the question that :

  1. When to fill the 15s wave buffer and how to synchronize to the start of frame? Just like what W5BAA said to press the button when in the lull in the FT8 traffic? But the time from the release of the key to the start of the frame will not be a constant for each time the trigger , the decode will handle the deviation by itself?

  2. You had told me that "you need to record and store it somehow, there's no iterative/online decoding, all the searching for candidates and decoding happens in bulk late in the slot", but I read what W5BAA said " I was able to convert your code to simple C and to break up the processing where I do an extract of the power spectrum every 160 msec rather than try to process 15 seconds of time series in one gulp ". So is there any possibility to break the find_sync() into peridic task to make the searching for the synchronize automaticly?

Many thanks for your help!

Best Regards and 73!
James

@kgoba
Copy link
Owner

kgoba commented Aug 10, 2021

Hi James,

to answer your questions,

  1. The decoding and candidate finding can adapt to a wide range of time shifts, even if the message is clipped from the start or from the beginning (WSJTX does the same). The button method just sets the 0 mark and then the board can count 15, 30, 45, 60, etc seconds from that and use those marks to start the decoding. In practice it works well and the clock does not drift significantly. You only have to set the 0 mark once (not too hard to do that by your ears or watching the waterfall), or perhaps twice if you want to fine tune it.

  2. Your previous question hints that you mean to store the waveform for the whole 15s slot, but that's not what I indended. The find_sync() and decode() routines only care about spectral magnitude, so the reasonable approach would be to do STFT analysis during the 15s slot itself, accumulating e.g. 160 ms frames of waveform, doing the FFT transform, calculating magnitude and storing it. That's what I meant with record and store. So there's no gulp at least in the DSP processing. The gulp happens only when you call find_sync() and decode() at the end of 15s slot.

How to properly prepare the magnitude data, you can look up in decode_ft8.c in this repository. While that is a desktop application which only accepts 12000 Hz sample rate files (which is also conveniently what WSJTX works internally with and records), you can adapt the DSP processing to other sample rates. There are some constraints that the decode() and find_sync() expect regarding the magnitude data - the frequency bins have to correspond to 6.25 Hz (or better yet half of it, 3.125 Hz), and time resolution has to be also 6.25 Hz (160 ms) or better yet, half of it - 80 ms. This is perhaps not too obvious from the source code since it's not too well documented, but I am working on it. Just try to build the repository with 'make' and run 'decode tests/websdr_test1.wav' to see it in action, and perhaps you can use debugger to step through it. When I have more time I will try to add comments what the DSP portion is doing (there are some clever tricks that improve decoding accuracy), but frankly the DSP part was not intended as a part of this library, since it's very platform dependant. On ARM you might use CMSIS-DSP for filtering/resampling/FFT, while on desktop something other has to be used, so I just created the bare FT8 decoder here which accepts magnitude data. How you get it, is up to you, but of course, I am willing to guide the users of this library.

Also note that I just converted the source code to pure C. There were very few changes needed since it did not use any significant features of C++, but I wanted to do this a long time since it's kind of more accessible to other developers that way.

@szjdb
Copy link
Author

szjdb commented Aug 11, 2021

Hi, Karlis,
Many Thanks for your so detail advice! Your code is very clear and easy reading!
I got it and tried it on board. First step is to fill the 15s long buffer which is about 8k*15s float number long and allocated in the SDRAM, and then extract_power() into the mag_power ,which is also allocated in the SDRAM. Then callingthe find_sync(), it could find 16 to 36 candidate and success decoding just 1 message ,which trigger the found flag. But look into the message , the infomation is wrong and then the code run to the hardfault following.

I am using the default bp_decode and set the kMax_decoded_messages to 10. The decode function were in the main cycle and the realtime audio sample buffer is filling in the SAI DMA interrupt callback for the codec when trigger by a button . Everything looks like normal but can't have the right result.
May I ask for your F746 disco board sample code for reference?
Thansk for your great work to make it come to the MCU platform!Hope to meet your in the air with your FT8 lib.

Best Reagrds & 73!
James
email: [email protected]

@kgoba
Copy link
Owner

kgoba commented Aug 11, 2021

Ah, you're right, I didn't think that extract_power() was designed to work with the whole signal in bulk, so indeed if you used it as it is, you have to store 8k*15s of waveform data. This processing can definitely be turned incremental - if you look what parts of the signal extract_power() accesses and organize the audio input chain accordingly, it can be made to work with small buffers only. Then the only thing you need to store is mag_power (and even for that I am starting to have ideas, perhaps I can reorganize it even so that you don't need the whole thing, but for now let's stick to this).

Take a look how this incremental (block by block) processing is organized in the embedded code. This still uses slightly earlier version of ft8_lib, but the difference is not essential.

Regarding your hardfault, it's hard to say, perhaps there's some memory access issues.

@szjdb
Copy link
Author

szjdb commented Aug 11, 2021

Hi, Karlis,
Many Thanks for your kindly help!

I will check my code step by step and report to you later.

Best Reagrds & 73!
James

@wb0gaz
Copy link

wb0gaz commented Jun 5, 2023

Hello - new user/developer question (05 Jun 2023) -

I am interested port FT8 to STM32 microcontroller which has only 64K RAM. Project notes specify under 200K RAM. Is this space for code+data or only data?

Thank you for clarification!

@kgoba
Copy link
Owner

kgoba commented Jun 5, 2023

Hi WB0GAZ,

Unfortunately with the current approach it would be hard to get this running on 64K RAM for data alone. The reason is both the original WSJT-X implementation and my approach have to record/store the signal for (almost) the entire FT8/FT4 time slot, that is 15/7.5 seconds. WSJT-X stores the raw waveform, which occupies about 12000 Hz * 15s * 2 bytes/sample = 360 kBytes, and then at the end of the slot runs a lot of DSP processing to find and extract the messages. My approach distributes this load somewhat by doing the FFT part during the time slot, but I still have to store the results somewhere, and that takes about 400 bins * 2 OSR * 90 symbols * 2 OSR = 144 000 FFT magnitudes for FT8, which I already store as single-byte entries, so 144 kB the minimum. You could still get it running by disabling time/frequency OSR, theoretically reducing the required RAM amount 4 times to about 36 kB, but the decoding accuracy suffers quite a lot.

Perhaps your STM32 microcontroller could support external RAM somehow?

@wb0gaz
Copy link

wb0gaz commented Jun 5, 2023

Thank you Karlis for the detailed and helpful reply!

External RAM is possible using QSPI chip (but then RAM is accessed with I/O routines like disc drive peripheral, not so efficient and difficult to support existing software design which expects direct address space access).

With some large 100-pin STM32 package, also has on-chip peripheral called "fsmc" (flexible static memory controller), which can access RAM QSPI and map into 32-bit address space. Not as fast as native RAM, but possible solution.

Your analysis will help me make hardware device choice much sooner, because I will not have to analyze software to understand this requirement.

THANK YOU!

Dave
[email protected]

@kgoba
Copy link
Owner

kgoba commented Jun 6, 2023

Wish you the best with your project, certainly very exciting to see someone pick up the idea of rolling their own HW device for FTx. I would love to do that myself, but I'm too distracted with all other aspects of my life. I also have ideas how to extend this library, but you know, one day. The furthest I got was to get it running on a F746 Discovery board as you might know (https://www.youtube.com/watch?v=n5hWDzu-65g)

@wb0gaz
Copy link

wb0gaz commented Jun 6, 2023

Thank you again Karlis.

I did research about STM32 with large RAM (>=256K) in LQFP48 package (my preferred configuration for hardware device).

I found 6 types currently production and US stock, all modern versions (STM32L552, STML4P5, STM32U585) with >= 256K RAM. Therefore, embedded project is possible with modern MCU type (until now, I am working only with old/cheap STM32 MCU types for lower cost, but <=64K RAM is typical in LQFP48.)

@kgoba
Copy link
Owner

kgoba commented Jun 6, 2023

If you're ok going one notch higher, STM32F722 seems to be a good candidate, but it's in LQFP64. Keep the operating frequency in mind too, since the decoding of messages happens in a constrained window of about half a second between FTx signals. I had some ideas to offload part of the decoding to the rest of signal slot time, but it's quite some work to implement and would require even more memory to keep the partial results.

@wb0gaz
Copy link

wb0gaz commented Jun 6, 2023

I understand --- FT8 was designed for giant high speed PC!

STM32U5x family 160 MHz, F722 216 MHz. Burst decoding activity during short FTx quiet time will be concern. STM32U5x device 768K RAM, so temporary storage not severe constraint.

In extreme case, maybe perform decoding during next 15 second receive period, and experience delay of received messages for full FT8 cycle.

@kgoba
Copy link
Owner

kgoba commented Jun 6, 2023

In the linked video you can see that F746 (clocked at 216 MHz) decoded ~15-18 signals during medium traffic on the band in about 1.2-1.3 seconds. Each FT8 message is actually only 79*0.16=12.64 seconds on the air, but start times are dispersed. So it's a squeeze, but the decoder can also handle partial messages that miss the start or the end. So as far as I remember, I configured it to gather about 85 symbols, that is to run for 13.6 seconds, and use the rest of the time to decode (discarding whatever audio comes in during the decode). Worked quite nicely. I didn't use any clock source to sync the slot time, but rather used the user button to reset time once, and then after that I adjust the internal 15-second timer according to the average arrival time of the decoded FT8 messages (this you can also get from the decoder, or rather the position of the initial symbol wrt to the recorded data), so it's auto-clocking after the button press. That was just my guess of the record/decode strategy, you might find another one that works better for you.

@wb0gaz
Copy link

wb0gaz commented Jun 6, 2023

Thank you Karlis, it is good to understand. Your method of establishing timing from manual start is excellent! It will avoid complications in portable operation.

Can you tell me what is the size of the code segment of compiled application you tested and loaded to STM32 F7 Discovery module? That is, the bytes written to flash memory by programming/debug adapter integrated in STM32 F7 Discovery module?

I ask this, because large RAM in STM32U5 (768K) makes possible to store som executable code in RAM (transferred from flash at boot time), then execution is always zero wait state (compare to wait state required when cache miss at 216 MHz.) It may be possible to gain slight improvement in run time by execute from RAM.

@kgoba
Copy link
Owner

kgoba commented Jun 7, 2023

I checked the old code, and here's the output of objsize:

   text	   data	    bss	    dec	    hex	filename
 122444	   1708	  40328	 164480	  28280	.build/firmware.elf

So the executable part is about 120 kB. The app code was lightweight in terms of simple non-interactive UI just to get the waterfall drawn and text printed. So that gives roughly the minimum size.

BSS with 40k stands for all my service variables like audio buffers etc, but doesn't include the array for FT8 magnitudes, which I mapped to the external RAM. That actually might have slowed down the decoding, but I haven't check by how much.

I haven't made the app code public due to various reasons, including being cautious of various parties from a certain country which are known to steal code/design and sell them as their own. I am all for people experimenting and including the library into their development process with due crediting. If you check the issue tab here, there's been interest from hams in making their own applications, and I am happy to help and guide. Also I know the inner workings of the library, but the UI code was rather a proof of concept which I would not like to support. Happy to explain my approach there to get it working (which you also probably found already mentioned in the issues here), but that definitely is not the only possible one. If you get your hw platform running, I would be happy to share the code with you, just explaining the context.

@wb0gaz
Copy link

wb0gaz commented Jun 7, 2023

Thank you very much this detail! All code will fit easily in RAM if necessary. I also am using pure C (no C++).

First step will be verify code FT8 libraries and test application main() working correctly before attempting any change. That is not immediate, because it will take some time before I can get basic radio function working in STM32U5 family (target for hw platform). I did not have requirement for large RAM until now.

Thank you again. I am in "watch" status this repository.

73 Dave WB0GAZ

@kgoba
Copy link
Owner

kgoba commented Jun 7, 2023

Note that there are two versions, the master branch contains a workable version, but with somewhat dated approach, yet mostly in sync with the original WSJTX implementation, but the branch update_to_0_2 contains a newer approach and different API that I like much better, albeit still slightly stuck in forever-work-in-progress. But that is life :)

Wish you luck with your work!

@wb0gaz
Copy link

wb0gaz commented Jun 7, 2023

Thank you for guidance for newer API, I will return again when I am closer to initial time for experimentation!

73 Dave WB0GAZ

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants