-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very slow thread shrinking after rapidly grow. #4585
Comments
The code is doing the slow shrinking by design: the idea is that if you had a load spike that caused the creation of lots of threads, you don't want to shrink too aggressively because you can have another load spike. Please make sure you are on the latest version of Jetty, since we have fixed a couple of issues related to the thread pool and shrinking recently. |
For that configuration we have idleTimeout parameter. If we know that the spike will occur within N seconds since last spike, then we'll probably just grow the idleTimout value.
|
@gregw thoughts? |
Perhaps we can introduce
|
Firstly, what version is your "latest" ? We have had some recent releases that did create spikes in the number of threads due to a race in the new thread logic that created more than were needed. I am firmly of the belief that we do not wish to let all threads die the moment they are idle. Having slow shrinking has been a worthwhile feature for a long time and helps to smooth the churn in thread count with uneven loads. If you have Note that our usage of the reserved thread pool does give us good MRU behaviour, as we prefer to use reserved threads and they are the MRU. So we tend to favour MRU threads and if we didn't then a small load would prevent shrinking, which we do not see. The cost of an idle thread is small. Probably less than the cost of complex shrinking logic. I'd much rather spend effort avoiding creating unnecessary threads in the first place. So let's make sure you are running a version that doesn't have the create thread race. |
Thank you for your response. |
Currently we're using the following version: 9.4.26.v20200117. |
That is indeed the latest. 9.4.27 is just about to be built, but it only has a small tweak to the reserved thread pool. So you don't have any know reason for creating too many threads for a spike. Do you believe the jump from 100 to 350 threads was reasonable? Could your load of had several hundred simultaneous requests in a burst? Are you satisfied that setting a low idleTimeout on the thread pool can give you agressive shrinking if you want it (eg you could set 1s and shrink from 350 in 5 minutes). |
We can not currently say that the problem is Jetty itself. We noticed that our pretty rare JDK had SSL cache issues described here: The overall problem is that suddenly one thread acquires lock on an SSL cache and forces all other threads to wait for about 2.5 minutes while the cache gets clean. Here's the most interesting stack trace of the problem thread from JVM:
Currently we're observing the system for any other extra spikes and can not tell that the problem has completely gone. |
This issue has been backported to OpenJDK 8, so you want to use a recent version of OpenJDK 8 to avoid the JVM lockup. A JVM lockup like what you describe would definitely cause a spike in the number of threads when the lockup resolves. What do you mean by "pretty rare JDK"? |
I mean we've used Oracle JDK 1.8 below 1.8.211 version. |
@sadko4u news about the behavior of your system? |
After Java update we haven't noticed rapid thread grow on our system. |
@gregw I think we should discuss this further. All the GC in modern versions of Java can now give back uncommitted memory to GC, so in cloud environment you can pay less if the server is idle. Note also that a reserved thread that expires goes back to the thread pool where it is subject (again) to idle time - in the best case a reserved thread expires in twice the idle timeout (thread pool idle timeout + reserved thread executor idle timeout). |
@sbordet The threadpool already has it's own idleTimeout, which can be configured to an aggressive small number if desired. There is very little difference between shrinking 100 threads every 600ms vs 10 threads every 6000ms vs 1 thread every 60000ms. The only tweaks that I can think of are:
But I'm not really feeling that these are high priority changes. We would probably be better doing a full audit of our memory footprint to reduce memory per thread rather than reduce threads. |
Closing then. |
Thanks for sharing this. @gregw
Probably we want to consider threadLocal usage of applications. That's where this LRU policy causes instability. As application just want to release resources after the spike. |
Hello!
We're currently exploring the problem of spontaneous and rapid grow of thread pool after switching from old version of Jetty to the latest one. Currently there is no reason to say that the cause is Jetty internals but we've met a very strange behaviour at thread shrinking.
Consider looking at this monitoring graph:
It's a total amount of launched threads in Oracle JVM 1.8.x. You see a rapid grow here, a period when Prometheus system does not respond to statistic requests due to the reason that all jetty threads are busy, and then a very slow thread shrinking curve.
I believe this code is completely illegal and allows to shrink only one thread within idleTimeout period:
https://github.com/eclipse/jetty.project/blob/9d589f1b6e2b8cc09d5bf53ab00a68a74d7b7b3a/jetty-util/src/main/java/org/eclipse/jetty/util/thread/QueuedThreadPool.java#L919-L929
The expected behaviour should look like one of LRU/LFU algorithm implementations: if thread does not perform any useful job within idleTimeout, it must be shrinked.
The text was updated successfully, but these errors were encountered: