-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
performance: loxilb starts consuming 100% CPU only after a few seconds #499
Comments
Thanks for bringing this to notice. It seems strange but we will have a look and update soon. |
Hi @luisgerhorst, we have tried to reproduce this with loxilb latest docker but in our test we couldn't find this issue. nginx and wrk seems to be taking only 10%. We used validation-wrk script to test this. If you are using some other config/steps then please share with us and we will try with them. You may join our slack channel, we will be able to assist you better. |
I'm sorry for the incomplete description. I have been running wrk2 at a much higher rate than the version merged (12.5k RPS, roughly 80% of the max. on my machine). I run https://github.com/luisgerhorst/loxilb/blob/ccf029a1f6cf8b914b23909d4cc922a4c32662d0/cicd/tcpsctpperf/validation-wrk using The screenshot has the x axis separated into 6 segments. The dark purple line is loxilb which is mostly idle in segments 3 and 4 and ramps up in segments 2, 5, and 6. The red/green line in segments 3-6 are wrk2 and nginx (red/orange in 1-2). The light purple line is parca which periodically processes the cpu samples collected. Here's the CPU profile while loxilb is in it's idle phase: https://pprof.me/2d2a1527503cfa10ae0a46890b2cb3a0 And here's the CPU profile when loxilb is in it's busy phase: https://pprof.me/414cf26812ea9d7d3b973563bec491ed Iterestingly, loxilb seems to consume 100% here even thougth the benchmark is not at it's limit (I can achieve 15.5k RPS using this same setup). Therefore, maybe this behaviour is not acutally limiting the performance (at least directly). |
Loxilb has a garbage collector which monitors its connection-track entries. If a connection goes through its normal life cycle - e.g init, init-ack, est, fin etc, eBPF module itself cleans up the CT entries. But for half-cooked connections, the garbage collector comes into play. Currently, it is set to aggressive GC.One potential solution is to trigger GC only when there is a space pressure in its CT map. |
PR : gh-499 relaxed garabge collector characteristics
Describe the bug
When I run one wrk2 worker against one nginx worker through loxilb as configured by cicd/tcpsctpperf first only wrk2 and nginx consume about 70 and 80 percent of CPU, but after a few seconds loxilb starts consuming 100% CPU (in kernel mode) too. I have pinned wrk2/nginx/loxilb to separate cores.
Does anyone know why this happens?
I am not sure if htop attributes the BPF program runtime to loxilb or if only the user process is included.
Expected behavior
I simply find it odd that the CPU load seems to only pick up after a few seconds. Is this maybe some logging that should be disabled for performance tests?
The text was updated successfully, but these errors were encountered: