-
Notifications
You must be signed in to change notification settings - Fork 804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Debugging instability of Lilygo_rtl_433_ESP #2043
Comments
Idealy a serial monitor with exception decoder activated would help along with the logs associated with the reboot. |
I ordered a spare Lilygo LORA, so I can hook that up to a serial monitor when it arrives. |
By adding this line into your environment:
|
Any news about the issue, my device arrived today I have maybe the same issue. My 5 in 1 bresser is detected as 6 in 1 what's also wrong. Not sure if this is the problem or there is general something wrong. |
OK - I got my new board and it too reboots -- even more often, every few hours actually!! I will recompile with |
This is correct |
Is yours fully rebooting? I'm finding the restart connects to WiFi, no 433 radio or mqtt (some sort of limited web panel) a second restart works. |
Reboot automatically and fully every number of hours - highly variable - seems to reboot even more on my new one... |
OK - I have been monitoring serial messages with the 'monitor_filters' command for the past day and have seen the following:
But those messages don't seem to be temporally related to the actual crashes. Is this helpful???? |
Some additional observations: |
The rtl_433_Decoder stack size is not externally visible, try increasing this by about 1000 |
OK increased Any idea what the root cause of this is? |
@puterboy I had tuned the stack size based on my real world results in a attempt to balance free memory versus not using too many resources. Within the OMG environment we have multiple modules sharing resources so if one gets greedy, the others suffer. |
@puterboy PS If you have a value that works for you, pls submit a PR or issue against rtl_433_ESP, and I will try to include in the next release |
Yes, you can have the stack available at the task level by using this A PR will be welcome to add this indicator. |
Sure. I added the code to SYStoMQTT based on your original comment. That seemed to make sense since stack memory is similar to I see you edited your comment and are now suggesting |
Yes please, this way it would be with the other RF infos |
Stack overflow is definitely at least the primary issue here. I will continue to monitor to see if increasing by 1K is sufficient. Is there any way to dynamically change the stack size if high water mark dips too low? |
I assume that if in this section, for consistency, no need for sensor discovery logic to be added. |
Indeed, as we do with BT we don't autodiscover the modules task free stack |
We could delete it and recreate it with a different size but I'm not sure if this approach is relevant. |
Comparing the two Lilygo rtl433's I have, I noticed that the one upstairs resets more often than the one downstairs -- and correspondingly, the one upstairs is receiving signals from more 433MHz devices. Presumably, the more devices transmitting, the more the queue grows, leading to more stack consumption if multiple signals are received in close approximation... If true, then it would seem that one should implement a method to dynamically resize the queue if it dips too low. One would presumably need to copy over the stack so that nothing is lost before setting up the task again... Otherwise, the stack size seems quite arbitrary and one either needs to set it so large that nobody fails in which case memory is wasted or set it an "average" user's size in which case some people will get crashes. Perhaps it would be helpful to get @NorthernMan54's input... |
Also, since arrival times of different devices are generally stochastic and statistically independent, after enough time you will have an instance where all your devices send a signal at approximately the same time... so the worst case eventually happens... |
@puterboy Pls keep in mind that the majority of the rtl_433_ESP code base is a direct clone from rtl_433, which runs on machines with a large memory footprint. So the stack usage is high. If you look, a couple of features from rtl_433 require a huge stack ( almost all the memory on a ESP32 ). So current stack size is a bit of a comprise. The code base takes each received signal, allocates a spot on the heap ( not stack ), and queues it for processing thru the rtl_433 code base ( the 100+ device decoders ). Am thinking that one of the decoders, when it sees a particular signal, is consuming the stack. The processing queue has a max size of 5, so the memory hit from a large number of signals is managed, and hits the heap ( so this queing is not the source of the stack issue ). After working recently with esp32-s3 with 8Mb ram, am wonder if running with a larger memory footprint may be beneficial. Hopefully a board with a display and a SX127x chip will be built by a manufacturer, keeping it simple like the lilgygo.
|
What you say about one of the decoders consuming the stack is perhaps consistent with the following: |
@NorthernMan54 is there any debugging logging that could report how much stack space is being consumed by any given decoder along with the actual device that ended up triggering it. |
@puterboy That degree of logging does not exist within the code base. The trick would be to log the stack size between each decoder run, and then review the logs and determine which decoder was invoked that increased stack space. |
That's exactly what I was thinking. Seems like it could be worthwhile to add such logging given the persistent and difficult to debug challenges with stack consumption. |
…ck to small on Lilygo Lora device This fixes the crash caused by OOM when low water mark on rtl_433_Decder_Stack drops below 0. (See: 1technophile/OpenMQTTGateway#2043) I increased the memory size by 1500 which after running for a week on 2 different Lilygo Lora ESP32 devices leaves the water mark at just over 1KB -- I want to leave a little spare in case there are other sensor configurations and edge cases that would dip further into the stack. I also wrapped the definitions of `rtl_433_Decoder_stack` with `ifndef rtl_433_Decoder_Stack` so that users can easily manually tweak the allocated stack size for their own particular situations.
I have been running with |
The above referenced two PRs (#2081 and NorthernMan54/rtl_433_ESP#156) should close out this bug... |
…ck to small on Lilygo Lora device (#156) This fixes the crash caused by OOM when low water mark on rtl_433_Decder_Stack drops below 0. (See: 1technophile/OpenMQTTGateway#2043) I increased the memory size by 1500 which after running for a week on 2 different Lilygo Lora ESP32 devices leaves the water mark at just over 1KB -- I want to leave a little spare in case there are other sensor configurations and edge cases that would dip further into the stack. I also wrapped the definitions of `rtl_433_Decoder_stack` with `ifndef rtl_433_Decoder_Stack` so that users can easily manually tweak the allocated stack size for their own particular situations.
@NorthernMan54 are you planning on implementing this? |
If you accept PR #2082 that will bump rtl_433_ESP to v0.3.3 which includes my patch to |
OK - closing this now that both PR's accepted and bug solved! |
I was doing some analysis of 'uptimes' by looking through old backups of home_assitant_v2.db and extracting times between reboots of the Lilygo_rtl_433_ESP device (running on a Lilygo LORA esp32).
I noticed that the device typically reboots every 10-60 hours looking back over the past 3 months.
Note:
Any suggestions on possible causes and how to investigate?
The text was updated successfully, but these errors were encountered: