Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set default wait time to 0 (no wait) to enqueue tcp events #27

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

mathieucarbou
Copy link
Member

@mathieucarbou mathieucarbou commented Feb 8, 2025

This PR is based on #24.

The added commit in this PR:

  • set the default wait to 0 to not wait for enqueuing events
  • increase the queue size to 128 to allow for more events inside in the case of lot of concurrent connections
  • abort() if an event cannot be queued

Currently in main, the WD is triggered in case of a slow onBody callback, in the similar way it was problematic with the chunk callback.

This can be reproduced with the PerfTest in main, doing:

  1. launching autocannon -c 32 -w 32 -a 96 -t 30 --renderStatusCodes -m POST -H "Content-Type: application/json" -b '{"foo": "bar"}' http://192.168.4.1/delay
  2. in a terminal at the same time:: curl -v http://192.168.4.1

In main, this crashes quickly with:

[ 34044][D][AsyncTCP.cpp:386] _tcp_poll(): throttling
[ 34049][D][AsyncTCP.cpp:386] _tcp_poll(): throttling
[ 34054][D][AsyncTCP.cpp:386] _tcp_poll(): throttling
[ 34059][D][AsyncTCP.cpp:386] _tcp_poll(): throttling
[ 34063][D][AsyncTCP.cpp:386] _tcp_poll(): throttling
[ 34068][D][AsyncTCP.cpp:386] _tcp_poll(): throttling
[ 34073][D][AsyncTCP.cpp:386] _tcp_poll(): throttling
Uptime:  36 s, requests:   1, Free heap: 206412
E (39585) task_wdt: Task watchdog got triggered. The following tasks/users did not reset the watchdog in time:
E (39585) task_wdt:  - async_tcp (CPU 1)
E (39585) task_wdt: Tasks currently running:
E (39585) task_wdt: CPU 0: IDLE0
E (39585) task_wdt: CPU 1: loopTask
E (39585) task_wdt: Aborting.
E (39585) task_wdt: Print CPU 1 backtrace




Backtrace: 0x40084127:0x3ffb21c0 0x40084167:0x3ffb21e0 0x400dd0d7:0x3ffb2200 0x400d2c51:0x3ffb2220 0x400de9f8:0x3ffb2270 0x4008cd8e:0x3ffb2290
  #0  0x40084127 in esp_timer_impl_get_counter_reg at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_timer/src/esp_timer_impl_lac.c:112
  #1  0x40084167 in esp_timer_impl_get_time at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_timer/src/esp_timer_impl_lac.c:142
  #2  0x400dd0d7 in millis at /Users/mat/.platformio/packages/framework-arduinoespressif32-src-77c8e93767360b28deee4aedf5d0a1ab/cores/esp32/esp32-hal-misc.c:202
  #3  0x400d2c51 in loop() at examples/PerfTests/PerfTests.ino:231
  #4  0x400de9f8 in loopTask(void*) at /Users/mat/.platformio/packages/framework-arduinoespressif32-src-77c8e93767360b28deee4aedf5d0a1ab/cores/esp32/main.cpp:74
  #5  0x4008cd8e in vPortTaskWrapper at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:139

With this PR, it does not crash anymore. Both autocannon and the curl request can complete (autocannon requests are dropped) because of connection timeouts

image

And the curl request finishes.

There is also no memory leak and the WD does not trigger.

@mathieucarbou mathieucarbou self-assigned this Feb 8, 2025
@mathieucarbou mathieucarbou requested a review from a team February 8, 2025 09:25
@mathieucarbou mathieucarbou force-pushed the no-wait branch 2 times, most recently from 8e21c1d to a049251 Compare February 8, 2025 09:46
@mathieucarbou mathieucarbou marked this pull request as ready for review February 8, 2025 09:54
@mathieucarbou
Copy link
Member Author

@me-no-dev @vortigont fyi ;-)

@vortigont
Copy link

and more about autocannon - it is a very strange tool indeed
an example output:

➜ autocannon -c 64 -w 32 -a 512 -t 20  http://192.168.4.1

Req/Bytes counts sampled once per second.
# of samples: 448

512 requests in 6.01s, 1.09 MB read
29 errors (7 timeouts)

~ took 27s 

Looks like it completely ignores the integrity of a reply at all. Here it reports 512 reqs, but with asyncwebserver logs it was able to reply to about 40-60% of those. Those ones that droppped with _accept failed: pcb is NULL it also counts as OK.
Also it reports 6s while shell tells me that is took 27 sec to execute a command.
We might look for some alternative tool to base our test results on.

@mathieucarbou
Copy link
Member Author

@vortigont : yes this one is not done to test request results. it is done for perf test (req / sec) and mostly assume the com stack works OK.

you can addd --renderStatusCodes to see the number of req answering 200 OK

- Add logs for pcb was closed before reaching LwIP task
@vortigont
Copy link

vortigont commented Feb 9, 2025

just for the history - after many tests I've found that if LWIP is block-waiting on a full queue to enqueue a new event in that time the AsyncTCP task is also blocks on a very basic things like serving SENT/RECV event, even when callback code is very simple and does not delay executions.
I can't explain this for now. Would need more investigation.

I.e., yes it leaves us no choice but to use wait=0 and deal with event lost/queue management somehow to minimize the issues.

@mathieucarbou
Copy link
Member Author

Added a check to prevent accepting if the queue is nearly full, to test the behaviour.

AsyncTCP/src/AsyncTCP.cpp

Lines 1646 to 1649 in 13c3fcc

if (_connect_cb) {
if (uxQueueMessagesWaiting(_async_queue) < CONFIG_ASYNC_TCP_QUEUE_SIZE * 90 / 100) {
AsyncClient *c = new (std::nothrow) AsyncClient(pcb);
if (c && c->pcb()) {

Baseline: 16 conc. req 10 sec

image

32 concur req 10 sec

image

slow chunk request + autocannon (poll event coalescing and throthling)

  • Terminal 1: curl -N -v -G -d 'd=3000' -d 'l=5000' http://192.168.4.1/slow.html --output - => triggers one chunk get request with pauses of 3 seconds in user code each time the chunk callback is called in response
  • Terminal 2: autocannon -c 32 -d 10 -t 30 --renderStatusCodes http://192.168.4.1/
image

slow onBody handler + autocannon (test pause in request parsing)

  • Terminal 1: autocannon -c 32 -d 20 -t 30 --renderStatusCodes -m POST -H "Content-Type: application/json" -b '{"foo": "bar"}' http://192.168.4.1/delay => try to open 20 concurrent connections to do post requests for 20 seconds, each request doing a pause of 3 seconds in the onBody handler during request parsing
  • Terminal 2: autocannon -c 32 -d 20 -t 30 --renderStatusCodes http://192.168.4.1/
image

slow chunk callback + slow onBody parsing

  • Terminal 1: curl -N -v -G -d 'd=3000' -d 'l=5000' http://192.168.4.1/slow.html --output -
  • Terminal 2: autocannon -c 32 -d 20 -t 30 --renderStatusCodes -m POST -H "Content-Type: application/json" -b '{"foo": "bar"}' http://192.168.4.1/delay
image

slow request is aborted.

I am running with -D CONFIG_ASYNC_TCP_QUEUE_SIZE=128 to account for wait=0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants