logging hot threads on large queue of the management threadpool#140251
Conversation
Resolves: ES-13904
We should start fetching pages from the exchange sink asynchronously for each client. However, the current code should not be an issue, as we go asynchronous when sending messages in production. Therefore, this is marked as a non-issue rather than a bug.
…xecutorIT testCreatesEisChatCompletion_DoesNotRemoveEndpointWhenNoLongerAuthorized elastic#138480
Follow on from elastic#139769 to update some more tests for FP differences
…management-thread-pool
…management-thread-pool
|
Per discussion on ES-13904, my current conclusion is that we want to continue with this PR as well as adding support for possible external automation. The former helps satisfy the immediate need for investigation. The later requires additional work and could be useful for other future investigations. |
|
Pinging @elastic/es-core-infra (Team:Core/Infra) |
|
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
nicktindall
left a comment
There was a problem hiding this comment.
If we decide to do it this way, it seems like a good implementation
| this.contextHolder = contextHolder; | ||
| this.hotThreadsOnLargeQueueConfig = hotThreadsOnLargeQueueConfig; | ||
| this.currentTimeMillisSupplier = currentTimeMillisSupplier; | ||
| this.hotThreadsLogger = new FrequencyCappedAction(currentTimeMillisSupplier, TimeValue.ZERO); |
There was a problem hiding this comment.
One thing to note is that the FrequencyCappedAction is not thread-safe, I imagine it's not the end of the world if we log twice, but worth noting.
There was a problem hiding this comment.
Discussed offline and pushed 6202a70 to make the logging thread-safe.
DaveCTurner
left a comment
There was a problem hiding this comment.
I have some concerns about exposing this in settings that we must therefore continue to support for a long time. I don't think this is something we want to keep long-term.
|
|
||
| // There may be racing on updating this field. It's OK since hot threads logging is very coarse grained time wise | ||
| // and can tolerate some inaccuracies. | ||
| private volatile long startTimeOfLargeQueue = NOT_TRACKED_TIME; |
There was a problem hiding this comment.
Can we mention millis in the name here? Ideally relativeMillis since there's no need for an absolute time right?
| // and can tolerate some inaccuracies. | ||
| private volatile long startTimeOfLargeQueue = NOT_TRACKED_TIME; | ||
|
|
||
| private final AtomicLong lastLoggingTimeForHotThreads; |
| handler, | ||
| contextHolder, | ||
| hotThreadsOnLargeQueueConfig, | ||
| System::currentTimeMillis |
There was a problem hiding this comment.
Could we use the cached time from the threadpool?
There was a problem hiding this comment.
That was indeed the first thing I had in mind. Though it is technically possible. It implies quite a bit cascading changes to the builders, subclasses and utility methods. In addition, executors are built in ThreadPool's constructor so that we would need this-escape to pass the relativeTimeInMillis method. Overall I think it is not really worth the effort especially since we might not want to support it in long term.
| final var lastLoggingTime = lastLoggingTimeForHotThreads.get(); | ||
| if (now - lastLoggingTime >= hotThreadsOnLargeQueueConfig.intervalInMillis() | ||
| && lastLoggingTimeForHotThreads.compareAndSet(lastLoggingTime, now)) { | ||
| HotThreads.logLocalHotThreads( |
There was a problem hiding this comment.
This emits its logs after some delay - might be worth logging a message right now saying we're starting to capture hot threads at this point too.
|
|
||
| public static final HotThreadsOnLargeQueueConfig DISABLED = new HotThreadsOnLargeQueueConfig(0, -1, -1); | ||
|
|
||
| public boolean isEnabled() { |
There was a problem hiding this comment.
Could we use == DISABLED rather than needing to call this method?
There was a problem hiding this comment.
To do that, we will need to ensure the DISABLED instance is returned whenever the class is instantiated with sizeThreshold == 0. It is technically possible but would require private access to the class's constructor which means it cannot be a record and the need to add utility method. I am not sure whether this is a better trade-off compare to having this method. Please let me know if you think otherwise. Thanks!
There was a problem hiding this comment.
Ok we can go with this at least for now.
This is a good point. I moved the settings registration and hence the test to the serverless side. Please see 61687b1 and the linked PR for details. Thanks a lot! |
nicktindall
left a comment
There was a problem hiding this comment.
LGTM, just a question about whether EsThreadPoolExecutorTestHelper could live in src/test instead of src/main? but perhaps not possible
| this.hotThreadsOnLargeQueueConfig = hotThreadsOnLargeQueueConfig; | ||
| this.currentTimeMillisSupplier = currentTimeMillisSupplier; | ||
| this.lastLoggingTimeMillisForHotThreads = hotThreadsOnLargeQueueConfig.isEnabled() | ||
| ? new AtomicLong(currentTimeMillisSupplier.getAsLong() - hotThreadsOnLargeQueueConfig.intervalInMillis()) |
There was a problem hiding this comment.
NIt: this could also just be zero couldn't it?
There was a problem hiding this comment.
Not really. Zero effectively means the logging has an initial delay of 60 min (the default interval) on node start. So we can miss the logging if a large queue size occurs within that time frame.
There was a problem hiding this comment.
Would it? because now would be millis since epoch, so now - 0 would be much larger than intervalInMillis?
if (now - lastLoggingTime >= hotThreadsOnLargeQueueConfig.intervalInMillis()There was a problem hiding this comment.
Sorry I miss understood your suggestion. Yes, it can technically be 0 since we are using absolute timer. I didn't do it because that make test manipulation harder, i.e. the initial time is fixed and cannot controlled by the unit test, hence the choice. I prefer to keep it as is for the time being.
As discussed elsewhere, this helper class is located in |
|
@DaveCTurner I have addressed your review comments and intend to merge this PR in next 12 hour so that it has a good chance to make next week's release. I am happy to address any other comments you may have with follow-ups if it got merged before you could re-review. Thanks! 🙏 |
|
|
||
| public static final HotThreadsOnLargeQueueConfig DISABLED = new HotThreadsOnLargeQueueConfig(0, -1, -1); | ||
|
|
||
| public boolean isEnabled() { |
There was a problem hiding this comment.
Ok we can go with this at least for now.
* upstream/main: (76 commits) [Inference API] Get _services skips EIS authorization call if CCM is not configured (elastic#139964) Improve TSDB codec benchmarks with full encoder and compression metrics (elastic#140299) ESQL: Consolidate test `BlockLoaderContext`s (elastic#140403) ESQL: Improve Lookup Join performance with CachedDirectoryReader (elastic#139314) ES|QL: Add more examples for the match operator (elastic#139815) ESQL: Add timezone to add and sub operators, and ConfigurationAware planning support (elastic#140101) ESQL: Updated ToIp tests and generated documentation for map parameters (elastic#139994) Disable _delete_by_query and _update_by_query for CCS/stateful (elastic#140301) Remove unused method ElasticInferenceService.translateToChunkedResults (elastic#140442) logging hot threads on large queue of the management threadpool (elastic#140251) Search functions docs cleanup (elastic#140435) Unmute 350_point_in_time/point-in-time with index filter (elastic#140443) Remove unused methods (elastic#140222) Add CPS and `project_routing` support for `_mvt` (elastic#140053) Streamline `ShardDeleteResults` collection (elastic#140363) Fix Docker build to use --load for single-platform images (elastic#140402) Parametrize + test VectorScorerOSQBenchmark (elastic#140354) `RecyclerBytesStreamOutput` using absolute offsets (elastic#140303) Define bulk float native methods for vector scoring (elastic#139885) Make `TimeSeriesAggregate` `TimestampAware` (elastic#140270) ...
…tic#140251) We have a requirement to tell why the management threadpool gets piled up for extended time duration. The idea is to log hot-threads when that happens. This can be done manually which is the case for now. But the issue is that the queue is often drained when a human operator notices the alert and takes action. This PR is to explore the option to automate this within Elasticsearch. The changes add new logic to `EsThreadPoolExecutor#execute` so that it checks queue-size and tracks persisted large values. It eventually logs hot-threads if the queue does not drain in configured time. It has a few limitations: 1. `ThreadPool` and related classes cannot register dynamic cluser settings so that the new settings are static, i.e. they require rolling restart to take effect. 2. It requires a new task to be submitted to trigger the potential hot-threads. So it is possible that the large queue has sustained for longer then the configured time, i.e. the logging may not be always be in time. But this should not be too much of an issue since it is not very likely for the queue to stop moving for extended time at exactly the threshold. A big question that I have is that whether this can be automated outside Elasticsearch? Today the alert requires human interaction which can be slow (thus too late for capturing at the right moment). But is it possible to add automated alert response so that it calls hot-threads API against the relevant cluster and this may have a much better chance to get the desired output? If possible, this would save the code change here and onging maintainence. The internal approach with this PR does have its own advantages such as (1) guaranteed to get the necessary output (2) the hot-threads output could be more interesting since it includes the stacktrace for the thread that is submitting the task. This could give a better hint on what is hitting the management thread pool since existing running tasks on the pool may be from different sources and only those tasks being queued are relevant. Overall hot-threads is a close approximation to figure out "why the queue got piled up". Resolves: ES-13904
…ement queue size (elastic#5195) This PR registers the settings added in elastic#140251 so that they are only configurable in serverless. It also adds tests for the logging behaviour.
…ement queue size (elastic#5195) This PR registers the settings added in elastic#140251 so that they are only configurable in serverless. It also adds tests for the logging behaviour.
We have a requirement to tell why the management threadpool gets piled up for extended time duration. The idea is to log hot-threads when that happens. This can be done manually which is the case for now. But the issue is that the queue is often drained when a human operator notices the alert and takes action. This PR is to explore the option to automate this within Elasticsearch.
The changes add new logic to
EsThreadPoolExecutor#executeso that it checks queue-size and tracks persisted large values. It eventually logs hot-threads if the queue does not drain in configured time. It has a few limitations:ThreadPooland related classes cannot register dynamic cluser settings so that the new settings are static, i.e. they require rolling restart to take effect.A big question that I have is that whether this can be automated outside Elasticsearch? Today the alert requires human interaction which can be slow (thus too late for capturing at the right moment). But is it possible to add automated alert response so that it calls hot-threads API against the relevant cluster and this may have a much better chance to get the desired output? If possible, this would save the code change here and onging maintainence.
The internal approach with this PR does have its own advantages such as (1) guaranteed to get the necessary output (2) the hot-threads output could be more interesting since it includes the stacktrace for the thread that is submitting the task. This could give a better hint on what is hitting the management thread pool since existing running tasks on the pool may be from different sources and only those tasks being queued are relevant. Overall hot-threads is a close approximation to figure out "why the queue got piled up".
Resolves: ES-13904