-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add custom variant of jetty QueuedThreadPool that allows for configurable pool decay #79
base: fs/branch_9x
Are you sure you want to change the base?
Conversation
private String prefix; | ||
|
||
public InstrumentedQueuedThreadPool(@Name("registry") MetricRegistry registry) { | ||
this(registry, 200); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be worth pulling some of these: maxThreads, minThreads, idleTimeout as constants for the default values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely, if this were our code; but I'm keeping it as close as possible to the upstream class it modifies, so I think prefer to leave it as-is.
@@ -283,6 +283,7 @@ private void init(int port) { | |||
QueuedThreadPool qtp = new QueuedThreadPool(); | |||
qtp.setMaxThreads(THREAD_POOL_MAX_THREADS); | |||
qtp.setIdleTimeout(THREAD_POOL_MAX_IDLE_TIME_MS); | |||
qtp.setIdleTimeoutDecay(THREAD_POOL_MAX_THREADS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want a different constant for idleTimeoutDecay or should this actually be tied to maxThreads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For JettySolrRunner
it probably doesn't matter. For clarity though I agree -- propose to make a different constant, same value?:
private static final int THREAD_POOL_MAX_THREADS = 10000;
private static final int THREAD_POOL_DECAY_RATE = THREAD_POOL_MAX_THREADS;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 on a new constant, same value. Provides better clarity.
// ======================================================================== | ||
// | ||
|
||
package org.apache.solr.jetty; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious, have made any change to this file for our purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The commits are logically split to clarify what's pulled in from upstream and different changes -- functional (48c971f) and rote (a5a039b). This is complicated a bit while the upstream PR is in-flight, because the substance of the changes are in the first of these two commits, which (maybe misleadingly) refers to "changes from upstream Jetty PR". To clarify, these changes are not yet merged upstream, but we're trying to track here the changes as proposed in the upstream PR.
* in a <code>finally</code> block) that any outstanding entries in this {@link ShrinkManager} are | ||
* removed if the calling thread exits unexpectedly (i.e., throws an exception). | ||
*/ | ||
public class ShrinkManager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QQ. In every idleTimeoutDecay interval we shrink one thread now? earlier it used to happen for every idleTimeout interval?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. So there's now a distinction between the two values, with idleTimeout
being interpreted as the minimum amount of time that excess pool capacity must be idle before it's eligible to shrink, and idleTimeoutDecay
being the rate at which excess capacity shrinks once it's eligible to shrink according to idleTimeout
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. let me take a example for it.
- minthreads =1
- idletimeout= 60 seconds
- idleTimeoutDecay (i want 5 seconds after quick burst of requests - not clear how I will configure that)
===
let's say 10 threads created. Now how threads will shrink by time
65,70,75,80,85,90,95,100, 105 - (1 thread will remain)
is my understanding correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly. The behavior described in your example would be achieved by setting idleTimeoutDecay=12
-- i.e., 12 threads will be allowed to decay for every idleTimeout
(60s) interval, or one thread per 60s / 12 = 5s
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.by setting idleTimeoutDecay=12 -- i.e., 12 threads will be allowed to decay for every idleTimeout (60s) interval, or one thread per 60s / 12 = 5s
This part is not clear to me. I was expecting "5 seconds" decay setting to shrink after idletimeout. So if someone sets 0 seconds, then all threads will shrink immediately after idletimeout
.
Help me to understand this better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the confusion stems from the fact that idleTimeoutDecay
is ultimately a number of threads, not an amount of time. So If you want all threads to shrink immediately after idleTimeout
has passed, you'd set idleTimeoutDecay
equal to maxThreads
. Could set it higher, but it shouldn't make a difference practically speaking. So idleTimeoutDecay=1
would be equivalent to current behavior, where 1 thread is allowed to expire per idleTimeout
interval.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, first end user doesn't expect behavior of current implementation of idletimeout
. Now, it feels to me this new parameter will complicate the things more.
I think it would be simple if we can consider new parameter as timeout decay
- default timeout decay will be
idletimeout
- Easier to understand
- More importantly, there is
minthreads
parameter to keep minimum threads alive. - Also, it may simplify the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what the desired behavior you have in mind is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking of following two things
- keep current behavior as it is (decaytimeout=ideltimeout)
- if thread is idle for
idletimeout
then shrink it (decaytimeout =0). That's what most users(all) expects - In-between, decay every 5 seconds as described in above example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking this commit. That seems bit simple implementation
while (NanoTime.elapsed(last, now) > siNanos) | ||
{ | ||
// shrinkInterval is satisfied | ||
if (lastShrink.compareAndSet(last, Math.max(last, idleBaseline) + siNanos)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will happen with following usecase.
- process is running with min-threads.
- sudden activity happens and it create N more threads. Now total threads are (min-threads + N)
- Lets assume following things happen
3.1 all timestamps has same idleBaseline (Or 3.1.1 all threads come at same time here)
3.2 all threads will get same idleBaseline, and one will succeed here (most important only one thread will succeed as RETRY_LIMIT = 0?)
3.3 except one thread, all thread will continue in QTP.run() method and wait for idletimeout for next job?
3.4 then (3.1 - 3.3) again happens after ideltimeout.
Can this cause same behavior as today, where each thread shrinks after idletimeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The situation you're describing is addressed by the comment immediately below:
// NOTE: attempted CaS may fail, _very_ infrequently. If it does, that's fine.
// This is a "best effort" approach to shrinking, and even if our CaS fails,
// the missed "shrink reservation" will very likely simply be picked up by
// another thread.
Worst-case scenario here is that an opportunity to shrink is missed. Even in tests where we're specifically trying to trigger this behavior, such as testPrecise(), the behavior manifests infrequently. But we are able to trigger it sometimes, and this is the reason for the existence of the adjustable RETRY_LIMIT
static field: in artificial situations where we care about shrink behavior with extreme precision (really only writing tests I suspect), we need to retry to avoid the race condition you're describing.
But "in real life":
- situations triggering this issue will be rare, and
- even if the issue is hit, the more threads, the more likely it is that the "shrink reservation" will be picked up by another thread, and
- even in the event we hit this issue in a way that fully manifests, the consequence is minor to the point of being benign
{ | ||
// No job immediately available maybe we should shrink? | ||
long idleTimeout = getIdleTimeout(); | ||
if (idleTimeout > 0 && getThreads() > _minThreads) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other usecase is here
- what if all threads checks this condition same time and goes for pollIdleShrink() method
- Now, is it possible pollIdleShrink returns
true
for all the threads? if getShrinkInterval() is very small and each thread able to setlastShrink
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, good catch! I dug into this a bit on the upstream PR (thanks for reporting there!), and even writing a test specifically designed to trigger this case (thread count falling below minThreads), I was only ever able to get it to drop 1 thread below, and the ensureThreads()
method in the finally
block of the run loop brings the thread count back in line with configured minThreads
.
I'm inclined to go with the upstream, which appears to be leaning towards not worrying about this.
This PR is developed against Jetty 10, so would take a bit of extra effort to backport to
fs/branch_9_1
(at time of writing still on Jetty 9). I don't plan to bother with this because I expect the staging release branch should catch up to jetty 10 with enough time before a production release.The main benefit of making this change in Solr repo is that we can run the whole test suite with the new config options/behavior (including integration tests in
:solr:packaging:integrationTests
). It's also nice that this will place the necessary classes in the right place, so no extra work is required in deployment, etc.This is a temporary measure, so in order to be compatible with upstream formatting, we're disabling a bunch of format-type checks in CI for the
:solr:jetty
subproject.See also:
jetty/jetty.project#9237
jetty/jetty.project#9354