performance: loxilb starts consuming 100% CPU only after a few seconds #499

luisgerhorst · 2024-01-19T16:36:03Z

Describe the bug

When I run one wrk2 worker against one nginx worker through loxilb as configured by cicd/tcpsctpperf first only wrk2 and nginx consume about 70 and 80 percent of CPU, but after a few seconds loxilb starts consuming 100% CPU (in kernel mode) too. I have pinned wrk2/nginx/loxilb to separate cores.

Does anyone know why this happens?

I am not sure if htop attributes the BPF program runtime to loxilb or if only the user process is included.

Expected behavior

I simply find it odd that the CPU load seems to only pick up after a few seconds. Is this maybe some logging that should be disabled for performance tests?

TrekkieCoder · 2024-01-19T16:40:58Z

Thanks for bringing this to notice. It seems strange but we will have a look and update soon.

nik-netlox · 2024-01-22T04:56:46Z

Hi @luisgerhorst, we have tried to reproduce this with loxilb latest docker but in our test we couldn't find this issue. nginx and wrk seems to be taking only 10%. We used validation-wrk script to test this. If you are using some other config/steps then please share with us and we will try with them. You may join our slack channel, we will be able to assist you better.

luisgerhorst · 2024-01-24T17:33:48Z

I'm sorry for the incomplete description. I have been running wrk2 at a much higher rate than the version merged (12.5k RPS, roughly 80% of the max. on my machine).

I run https://github.com/luisgerhorst/loxilb/blob/ccf029a1f6cf8b914b23909d4cc922a4c32662d0/cicd/tcpsctpperf/validation-wrk using OSE_PERF_STAT="perf stat" OSE_LOXILB_SERVERS=1 OSE_LATENCY_PAYLOAD_SIZE=1024 ./validation-wrk 2 60 $(pwd)/ 100 | tee v.log against loxilb v0.9. Using parca I was able to record a CPU trace of the behaviour I observed:

The screenshot has the x axis separated into 6 segments. The dark purple line is loxilb which is mostly idle in segments 3 and 4 and ramps up in segments 2, 5, and 6. The red/green line in segments 3-6 are wrk2 and nginx (red/orange in 1-2). The light purple line is parca which periodically processes the cpu samples collected.

Here's the CPU profile while loxilb is in it's idle phase: https://pprof.me/2d2a1527503cfa10ae0a46890b2cb3a0

And here's the CPU profile when loxilb is in it's busy phase: https://pprof.me/414cf26812ea9d7d3b973563bec491ed

Iterestingly, loxilb seems to consume 100% here even thougth the benchmark is not at it's limit (I can achieve 15.5k RPS using this same setup). Therefore, maybe this behaviour is not acutally limiting the performance (at least directly).

UltraInstinct14 · 2024-02-01T15:38:01Z

Loxilb has a garbage collector which monitors its connection-track entries. If a connection goes through its normal life cycle - e.g init, init-ack, est, fin etc, eBPF module itself cleans up the CT entries. But for half-cooked connections, the garbage collector comes into play. Currently, it is set to aggressive GC.One potential solution is to trigger GC only when there is a space pressure in its CT map.

PR : gh-499 relaxed garabge collector characteristics

luisgerhorst added the bug Something isn't working label Jan 19, 2024

UltraInstinct14 added a commit that referenced this issue Feb 2, 2024

Merge pull request #515 from TrekkieCoder/main

5a16edd

PR : gh-499 relaxed garabge collector characteristics

nik-netlox closed this as completed Aug 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

performance: loxilb starts consuming 100% CPU only after a few seconds #499

performance: loxilb starts consuming 100% CPU only after a few seconds #499

luisgerhorst commented Jan 19, 2024

TrekkieCoder commented Jan 19, 2024

nik-netlox commented Jan 22, 2024 •

edited

Loading

luisgerhorst commented Jan 24, 2024 •

edited

Loading

UltraInstinct14 commented Feb 1, 2024 •

edited

Loading

performance: loxilb starts consuming 100% CPU only after a few seconds #499

performance: loxilb starts consuming 100% CPU only after a few seconds #499

Comments

luisgerhorst commented Jan 19, 2024

TrekkieCoder commented Jan 19, 2024

nik-netlox commented Jan 22, 2024 • edited Loading

luisgerhorst commented Jan 24, 2024 • edited Loading

UltraInstinct14 commented Feb 1, 2024 • edited Loading

nik-netlox commented Jan 22, 2024 •

edited

Loading

luisgerhorst commented Jan 24, 2024 •

edited

Loading

UltraInstinct14 commented Feb 1, 2024 •

edited

Loading