-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High cpu usage for outbound ssl requests after upgrading from v16.15.0 to v18.1.0 #43128
Comments
That's almost certainly caused by the upgrade to openssl 3. I don't know what your https://server.local endpoint is doing but try checking what the TLS version and cipher are with v16 and v18. |
@bnoordhuis Thanks for reply. Sorry, I forgot to mention about server with ssl ssl serverSSL server it is just plain nginx with self signed certificate and one location
In first attempts to debug issue I was using node.js with ssl as a server, but then replaced it with nginx. The problem with node.js which performs outbound ssl requests. tls version and cipherI tried to compare ssl information in v16 and v18, but haven't found difference.
v18 node --trace-tls full output
perf logs comparisionOne interesting thing, which I don't know how to treat correctly is in perf logs.
For v18
If I read this correctly in v18 (openssl 3) application spends most of the time in locks (pthread_rwlock_unlock and pthread_rwlock_rdlock) |
Also one interesting observation for node.js v18.1.0 +
gives me ~1500 req/s and in
TLS info
In cpu profiler there is no I'm assume that |
I am facing this issue too My EC2 always use all the CPU resource every few days, Then I find out CPU usage 100% when the nodes that used web socket, Few days ago, I tried to resolve it by switched the Node.js version from v17.9.0 to v16.15.0. |
Feb 11 Apr 15 Apr 17,21,22,29 and May 5,7,10,14 After investigating I believe the problem occurs between v17.5 and v17.9 But all the abnormal nodes are not using any TLS and SSL To investigate, I plan to go back to v17.9.0 and do a remote CPU profiling when the problem occurs |
In response to #43128 (comment): the handshake indeed doesn't look materially different and the client and server settle on the same cipher/key exchange algorithm/etc. so that can't really explain it. The fact that I wonder if HTTP connection pooling / connection reuse (or lack thereof) is the culprit. How many TLS handshakes do you see with v16.x vs. v18.x (and |
@bnoordhuis Looks like I should not use Here is how I checked new tls sessions and connections, there might be a more correct way to count it. I'm listening tls socket events and looking to
Both node v16 and v18.1 creates new connection per request and performs tls handshake (secureConnect event on socket). TLS sessions are cached and only one session exists for remote server. So, if I sent 10_000 requests I got 10_000 "connect" events, 10_000 "secureConnect" events. |
Another interesting thing with So, v18.2.0 much better and, which is more important, stable result, but I haven't tried it in production yet. |
Tried to disable ssl sessions at all from nginx side with V16.15.0
v18.1.0
v18.2.0
|
I spent some time investigating this and I can indeed confirm that a) it happens, and b) changes in openssl are responsible. First off, this is v16.15.0 (recorded with perf(1) and
And this is v18.2.0:
It's almost all openssl populating hash tables while holding a lock. This is all coming from The most common call graph for
And for
I've been going over open performance issues in openssl's bug tracker but I don't see an exact match. openssl/openssl#17627 is the bucket issue. Another pair of eyes would be most welcome. @nodejs/quic How do we confirm this isn't caused by the QUIC patch we carry? |
|
@mcollina from my observations To put To ensure that there is no keep-alive connections I added to nginx access logs information about connections, number of requests within connection, ssl sessions, ciphers and connection header. It looks like this
Both reuse tls session, and despite fetch sends |
@jasnell is there any dependency left on the QUIC patches. If I remember correctly you removed it and #38233 is the PR to add it back. If there is no dependency then we should be able to build from the OpenSSL source instead of the QUIC branch. That might be a way to confirm. |
Any update on this issue? |
I can't be 100% sure but preponderance of evidence suggests the QUIC patches aren't responsible because they don't touch that part of openssl. In other words, it's likely an upstream issue. We don't have control over that so I'm going to close this. If someone wants to pursue this upstream, please do. |
Please excuse my ignorance, but is the theory, that upgrading OpenSSL from 1.1.1 to 3.x caused the issue? If so, could OpenSSL 1.1.1 still be bundled/used? |
@paulmenzel Yes, the upgrade to 3.x is the likely culprit. Building from source and linking dynamically against v1.1.1 should work; it's tested by node's CI matrix. |
This could be related to openssl/openssl#15199, which is fixed in openssl/openssl#17881, which would be part of OpenSSL 3.1. |
Version
v18.1.0
Platform
Linux 5.17.4-arch1-1 #1 SMP PREEMPT Wed, 20 Apr 2022 18:29:28 +0000 x86_64 GNU/Linux
Subsystem
crypto
What steps will reproduce the bug?
I have observed this problem in few http clients: axios, got. But looks like a problem is in node.js itself.
hey
SecureContext.init()
. It starts from ~50% and going up to 80%. Amount of requests processed by server, which sends outbound https requests is constantly decreasing.How often does it reproduce? Is there a required condition?
Always reproduces if present outbound ssl requests and node version 18.1.0
What is the expected behavior?
Constant CPU usage, amount of processed requests is at the same level during time. In node.js v16.15.0 same code in CPU profiler have
SecureContext.init()
with usage of 5-8%. And during load there is no performance degradation.What do you see instead?
CPU usage is keeping increasing during time,
SecureContext.init()
starts from ~50% cpu time in profiler and keep growing during time to 80-85%. Amount of requests processed by server is constantly decreasing.Additional information
v18.1.0
Benchmark result - 186 req/s
node --perf output - https://pastebin.com/wyHPXfMH
cpu profile screenshot
v16.15.0
Benchmark result - 1184 req/s
node --perf output - https://pastebin.com/0L0Z01iq
cpu profile screenshot
The text was updated successfully, but these errors were encountered: