Expose fault tolerant execution and filesystem exchange metrics via JMX#12127
Expose fault tolerant execution and filesystem exchange metrics via JMX#12127arhimondr merged 4 commits intotrinodb:masterfrom
Conversation
|
I had to move exchange plugin from the |
33de1e0 to
06986a0
Compare
There was a problem hiding this comment.
When will this happen in practice?
There was a problem hiding this comment.
Rather unlikely, but we have a similar safeguard in other places
There was a problem hiding this comment.
This is to export the JMX beans under a specific prefix
There was a problem hiding this comment.
This is not on the critical path, do we need to record here too?
There was a problem hiding this comment.
It might be useful for tracking (for example if for some reason it starts to fail)
There was a problem hiding this comment.
For my understanding, can you briefly explain what these two annotations do?
There was a problem hiding this comment.
Methods annotated with @Managed are exported via JMX. @Nested tells the framework to recurse into the object returned by a method and export nested methods annotated with @Managed.
There was a problem hiding this comment.
Is this an optimization? Why do we need a class member blocked?
There was a problem hiding this comment.
This is to make sure only a single blocked future exists and tracked.
There was a problem hiding this comment.
I don't quite understand, why not simply the following:
return stats.getExchangeSourceBlocked().record(toCompletableFuture(
nonCancellationPropagating(
whenAnyComplete(readers.stream()
.map(ExchangeStorageReader::isBlocked)
.collect(toImmutableList())))));
There was a problem hiding this comment.
Because isBlocked is called by multiple threads concurrently and a different feature may be returned to a different thread skewing the metric while ideally we would like to keep our measurements as close as possible to the time it takes for the entire ExchangeSource to transition from "blocked" state to "non-blocked".
There was a problem hiding this comment.
Exceptions thrown out of a listener are not logged. So it's more of a "log it or loose it".
There was a problem hiding this comment.
I know currently dependent tasks are not executing until upstream task comptes. Yet it will change. In such case would we make downstream task as "FAILED" or "ABORTED".
If the latter then we should also compute stats for "ABORTED" tasks. It would be important to understand how much effort we are wasting on those.
Maybe we can just merge FAILED and ABORTED?
There was a problem hiding this comment.
I'm going to add private final ExecutionStats abortedTasks = new ExecutionStats(); and store the stats for both, ABORTED and CANCELLED tasks there. Not sure if it makes sense to track metrics separately.
There was a problem hiding this comment.
Should we have memory and network stats here too?
It feels not costly to add them and then we can decide which are the most important for us for tracking.
There was a problem hiding this comment.
I'm not sure if there's a reliable metric for network as the network traffic can occur at many different levels (connector / exchanges / coordinator-to-worker communication). Though it certainly feels like it would make sense to record peak memory utilization. Let me add it.
core/trino-main/src/main/java/io/trino/execution/scheduler/FaultTolerantExecutionStats.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
nit: Use Ticker/Stopwatch pair instead.
Move from `io.trino.plugin.exchange` to `io.trino.plugin.exchange.filesystem`
From trino-exchange to trino-exchange-filesystem
06986a0 to
6f8d936
Compare
Description
Exposes fault tolerant execution related operational metrics via JMX to enable live monitoring
Improvement
Core, Exchange
N/ARelated issues, pull requests, and links
-Documentation
(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
(x) No release notes entries required.
( ) Release notes entries required with the following suggested text: