-
-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESP8266 audioreactive UDP sync ported from MoonModules/WLED #3962
ESP8266 audioreactive UDP sync ported from MoonModules/WLED #3962
Conversation
…6-audioreactive-sync
As I might not have an ESP8266 to hand when doing the code review, if you could please include a screenshot of the Audio Reactive settings, including the sync dropdown being expanded please |
@blazoncek question as I'm not very experienced with 8266 - sync receive polls for new UDP multicast packets each 20ms. This works well on esp32. Maybe 8266 can have a problem with it? Do you think that adding 'yield()' before/after UDP calls might help? Or maybe it's the frequent alloc/dealloc of |
I think @willmmiles or @pbolduc will know much more about AsyncUDP/TCP than me.
|
Excuse the dumb question, but how do I enable stack trace printing? The only thing I see on serial while resets occur is this:
|
-D WLED_DEBUG (which you have) and monitor_filters = esp8266_exception_decoder in your PIO environment. But perhaps 8266 does not support WDT exceptions and does not print stack dump. |
One of the biggest issues is the Espressif networking stack has problems with high UDP traffic rates. It has been a while since I looked at the networking. I found this issue espressif/arduino-esp32#4104 which talks about errors with the parsePacket function. I need a reminder where the WifiUDP library comes from in WLED. |
WifiUDP is part of the Arduino framework on ESP8266 and ESP32. (The E131 receiver elsewhere in the project is using the ESPAsyncUDP library, which is not part of the framework.) My quick read of the WifiUDP framework code on ESP8266 suggests it's reasonably well written. By contrast, the ESP32 version is awful, heap allocating and freeing full 1460 byte packet buffer every time it polls to see if there's a new packet(!!). I pulled this PR in to my web server stability working branch and I haven't been able to reproduce the fault yet. It's neat to see the AR effects working on an 8266 again, though it's noticably laggy compared to the other ESP8266 running as a DDP segment beside it. @gaaat98 would you mind posting your config, in case there's something non-obvious I've got configured differently? |
@willmmiles I think the problem is is happening with analog (PWM) strips only - may also happening with bitbang ws2812 drivers. So I suspects it's a timing issue, or simply the 8266 is overloaded when doing high freq PWM (interrupt driven) + UDP polling + WiFi + other stuff. Besides UDP polling, audioreactive runs a linear filter (no loops) then simply copies received data to the variables used in effects. So audioreactive on 8266 does "almost nothing" in addition to UDP. |
@willmmiles Here's the configuration on my athom controller (esp8285), on this board the crashes happen after a few updates when the clock speed is set to values higher than 'Slowest' (the effect I tested was DJ Light but I assume also the others will do the same). This is the config of a ESP8266 board I used for testing, here the crashes are a bit less frequent but still happen. I noticed that if I open the /liveview page on a browser the number of reboots noticeably increases, which suggests that the boards crash due to overload. These are both configured as PWM strips, I also have another athom controller connected to SK6812 strip with UDP sync enabled that shows no such issues |
@gaaat98 please get a plain D1 mini board, configure it as an Athom controller and monitor serial output. |
@blazoncek That's basically what I did to obtain this log, the only difference is that I used a NodeMCU ESP8266 instead of D1 Mini, is that relevant?
Before this Serial only showed other The configuration used is the one I uploaded earlier:
I repeated that once again and collected other logs here: crash-log.txt, clock speed is set to 'Fastest', WLED crashes after a while if I set to effect DJ Light from the Android app (the NodeMCU is receiving sound data from an esp32 which is listening to music), or crashes immediately if I enable the Peek option from the app. EDIT: I was flashing the 2MB flash version because that's the one used on the athom controllers, so I also tried the standard NodeMCU build and the issue is still present |
I tried both bitbang ws2812 (actually my default ESP8266 test, due to a poor pin selection in a project I built before I discovered WLED!) and RGBW PWM in "normal" speed. No crashes for me after 12h of operation.
I suspect the polling latency might be the source of the visible lag. On 8266 calling the poll looks to be "basically free" in terms of CPU usage if there's nothing pending; we could probably poll every loop with no impact there (not so much on ESP32, though!!). Stability is more important than performance, though, so I'll investigate this later. |
Aha, there it is! Just got one with these settings. |
@willmmiles during your test with the PWM strip did you select a sound reactive effect on the receiving controller? Did you have another device sending the audio data? To have the sound reactive effects you need to make sure you pulled PR#3942 otherwise they are disabled. Sorry to ask but it's stange since I have almost immediate crashes |
Yes and yes. For the initial testing I had the same effect (GravCentric) running on three WS2812 strips - one on the ESP32 AR node directly, one on an ESP8266 over DDP, and one on an ESP8266 using AR sync. That's how I could tell the AR sync was laggy compared to DDP. ;) For the overnight test I reconfigured that last unit to add PWM output on some unconnected pins. It ran OK for me using the GravCentric effect at 'Normal' speed. Dunno if it was the speed, the DJ light effect, or peek, but it faulted right away when I set it up with that config ... though it hasn't faulted since. |
@gaaat98 try 160MHz builds. |
Same behaviour, I'm trying to establish reliable steps to reproduce the crashes but at the moment it seems that for me is enough to have the ESP8266 receiving data, the Dj Lights effect active, clock set to Fastest, and then just wait for maybe less than a minute while making some noises near the sending device... Logs for the 160MHz build here |
I managed to get the crash to reliably reproduce, but I haven't yet pinpointed the root cause. Here's what I've learned so far - might be old hat for some of you, sorry I'm still new to this environment:
I'm on vacation and away from home for the next couple of weeks, so I won't be able to do any more digging until I get back, but I wanted to leave some tracks in case anyone else was investigating. Sorry I don't have more answers! |
please condisder the following implementation of the
|
🤔 so the only "smoking guns" we have so far:
@willmmiles @gaaat98 can you try if adding a "yield()" before or after the wifiudp calls helps? I.e. somewhere here: receiving happens in "arduino loop" context so |
Confirmed this works too. I see there's a couple of new fields in the old padding - any plans to upstream those features? |
Latest PWM update: I think I've finally gotten to the bottom of it. It looks like an actual, honest-to-goodness CPU bug. The problem seems to be with the 'rfi 3' instruction used to return from the NMI used by the PWM timer. Sometimes this seems to hang the CPU instead of correctly returning. I don't know what causes it -- my best guess is contention with some other interrupt. After the fault, the CPU stalls until the next NMI (timer event) occurs. When it does so, EPC3, the "return address" from the NMI, is set pointing to the NMI's own rfi instruction. When it tries to return from the "nested" interrupt, it goes in to a loop just resetting the instruction pointer back to the same 'return from interrupt' instruction, forever and ever, until the hard WDT trips. The timer NMIs actually still fire -- so PWM still works in this stuck state! -- but since the return address is at the end of the NMI handler, it just goes back to its loop when the ISR completes. The good news is that the timer NMIs can maybe be used to recover the processor! I found a workaround by having the NMI handler back up the last "known good" return address, and restore it if the ISR finds the return address points to itself. My initial tests of this workaround have been very encouraging; it hasn't crashed on me yet, when it would typically fail in seconds. What's left to do is find the cleanest way to integrate this fix in to WLED. My development approach involves duplicating and rewriting the processor's base interrupt vector table in assembly. This is great for tracing interrupts, but pretty invasive and finicky. The easiest place to do it would be the in platform PWM code - it'd be just a few lines of C, actually - but I don't relish forking the platform, and I'm not sure I could convince the upstream guys of the CPU fault: obviously the current code works well for them! Alternatively, the platform core uses some linker tricks to allow overriding the PWM code; so we could pull the ESP8266 PWM code in to our repo with only a minimal space penalty, though we'd have to pay some extra attention to monitoring upstream changes when we update the platform. I'm somewhat leaning towards the latter solution. Anybody else got an opinion? I'd also note that we the platform offers a phase-locked PWM core we should probably switch to - the way they've implemented the base PWM system produces pathologically bad performance if you change several outputs back to back, like we do for RGB(WW) outputs. Each channel tries to keep its own phase, so you can often get separate nearly-back-to-back interrupts for every channel. :( The phase-locked implementation should never generate more than analogFreq interrupts per second, while the base solution could theoretically generate as many as (analogFreq * active PWM channels)! Thanks for bearing with me. It's been quite an adventure getting this deep in to the ROM and platform code. |
@willmmiles ESP8266 is nearly at the end of its lifespan. Just put a band aid on it and be done with it. Nevertheless, it would be nice to inform upstream of the findings. WLED used Tasmota framework in the past and @Jason2866 was very kind to us on many occasions so we could change to that if they incorporated such bugfix. |
We do not use core 3.x for esp8266. Tried the first releases. Was not stable for us. So we stayed on a modified v2.7.4 version which works well for us. There is no 3.x version from us and I won't do a 3.x based one. Tasmota based on esp8266 will end with our modified v2.7.x version |
Thank you @Jason2866 . |
@blazoncek Why should you want to revert? If the current version works well for you. |
From software perspective, I would like to use the phase-locked PWM generator - looks like it can be activated by adding Just one but - from electrical perspective. Phase-locked means that all PWM channels will be aligned, so more often we'll see all channels switch from 0 to 1 at the same time. This could become a problem for users with high current PWM setups using lots of color channels. If all PWM channels switch at the same time, its a sudden rise of power demand to the PSU. This may lead to stronger "coil whining" tones emitted from cheap PSU models. So we should habe some beta tester (with long PWM strips having many color channels) feedback, before going that route on 8266. |
@willmmiles just an idea about the "nearly-back-to-back interrupts for every channel" problem. What if we slightly delay the starting of each PWM channel? I'm thinking of something like If I'm right, this would bring the PWM channels a bit more "out of phase", making the "nearly-back-to-back interrupts" scenario less likely to happen? Edit: another option is called spread spectrum. For this, PWM frequency (or duty cycles) have to be slightly different for each PWM channel, which also reduces the likelihood of channels switching "in sync". |
Regarding phase shifted PWM: I've had a discussion about bringing that to WLED with @jw2013. IMO that would be great addition. |
We introduced lowering power supply load by "PWM auto-phasing" with this PR arendst/Tasmota#14590 for Tasmota (for esp32x). |
@blazoncek agreed. Either a new issue ticket, or maybe a new (draft) PR to also share related code. For "8266 sound sync" we still need to know if special handling in the UDP code is necessary to prevent PWM crashes. For the "singled logical LED" PWM scenario (see comments from @tablatronix), the 8266 could also be controlled by DDP and a separate segment assigned to the one-led remote 8266. So audio processing would entirely be done on the esp32 having the microphone, while 8266 can handle PWM outputs without troubles. @willmmiles i'll look into your other recommendations, too. |
nice 😄 could be something for our esp32 PWM driver, too. @Jason2866 If i understand correctly, checking if several PWM outputs are actually "phase-locked" just requires a 12€ logic analyzer and sigrok pulseview ? |
Yes, since you can test with a low PWM frequency (in our case 977Hz) |
I dropped my target fps to 30 and my stability has improved exponentially, might be anecdotal.. maybe it helps someone shrug |
I'll open a new PR tonight with a minimal PWM crash fix, vendoring in the soft PWM core currently in use. We can follow it up with a second PR regarding analog phase improvements. I believe the 8266 phase-locked core could be trivially modified to either (a) introduce a fixed phase offset for each line, or (b) introduce a "maximum transitions per tick" parameter. Either way, separate PR.
My $0.02 would be yes to phase shifting or spread spectrum, no to
I'm fairly confident that the PWM problems could be triggered without the sound sync code at all, I believe that any kind of network traffic risks it. The fast regular traffic from the sync code is exacerbating a pre-existing bug. I'll post a beta PWM fix tonight and hopefully we can head off these problems. |
* fixed a few typo's in comments * fixed 8266 specific warning about 'comparison of integer expressions of different signedness' based on recommendations made by @willmmiles: * make sure that audioSyncPacket is the same size (44bytes) on all platforms * use static buffer for receiving (avoids heap fragmentation) * copy receive buffer to local audioSyncPacket struct - avoids alignment problems * esp32 only: to stay in sync with UDP, Udp.flush() is needed when Udp.parsePacket() is _not_ followed by Udp.read()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gaaat98 thanks, good job 👍
Approved from my side.
//DEBUGSR_PRINTLN("Received UDP Sync Packet"); | ||
uint8_t fftBuff[packetSize]; | ||
static uint8_t fftBuff[UDPSOUND_MAX_PACKET+1] = { 0 }; // static buffer for receiving, to reuse the same memory and avoid heap fragmentation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NB: this line would also work without static
- putting the buffer into stack, instead of using global space.
The important point is just to make it a fixed-size array, because variable size arrays are allocated from the heap, each time the function runs.
I was not sure if it's better to use heap (no static
), or global space (with static
). @blazoncek I'll leave it up to you to decide what's better for 8266.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use stack if possible. It costs nothing and the array size is not problematic.
Edit: I'm not sure if array is allocated in heap if its size is dynamic. Only new
or malloc()
will allocate from heap AFAIK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if array is allocated in heap if its size is dynamic. Only
new
ormalloc()
will allocate from heap AFAIK.
That's the important difference. array[packetsize] was allocated from the heap - because compiler does not know the size when compiling so it couldn't arrange stack offsets of other local vars.
see https://en.cppreference.com/w/c/language/array section "Variable-length arrays"
I have not tested the code, but skimming it reveals no major issues. It has my "go". @softhack007 merge it when you're comfortable with it. |
Great to see we are almost there with this. Good work guys |
... protected against array overflow due to previous "if (packetSize <= UDPSOUND_MAX_PACKET)"
I'd like to add addendum to this PR. My recent PR #4066 exploits ESP-NOW for WiFi-less syncing between devices. If audio sync would use ESP-NOW (in addition to UDP) that would make large-scale audio deployments a possibility. |
I have a PoC of audio sync over ESP-NOW, which I was working on for the case where users had issue with multicast on their network @blazoncek https://github.com/MoonModules/WLED/pull/148/files It's 9 months old so would need to be updated before it could be merged, but shows that basic idea and some of the refactoring required to get access from the callback |
@netmindz yes, something like that but hooked into framework provided in upstream implementation. |
Tries to resolve #3960
However, it seems to be unstable on RGB(W) PWM strips at higher clock speeds, causing frequent reboots