Measure split listing time per table per query.#14713
Measure split listing time per table per query.#14713Dith3r wants to merge 1 commit intotrinodb:masterfrom
Conversation
1e676f1 to
3c30f98
Compare
core/trino-main/src/main/java/io/trino/execution/StageStateMachine.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/execution/StageStateMachine.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/execution/StageStateMachine.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/execution/StageStateMachine.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/execution/StageStats.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/execution/StageStats.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/execution/StageStats.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/execution/TableGetSplitDistribution.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/execution/TableGetSplitDistribution.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/execution/StageStats.java
Outdated
Show resolved
Hide resolved
|
The next step is to expose is via |
ed15192 to
32a6f41
Compare
core/trino-main/src/main/java/io/trino/execution/StageStats.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/execution/scheduler/EventDrivenTaskSource.java
Outdated
Show resolved
Hide resolved
How will it be available? |
@findepi Event listener |
|
@Dith3r is this ready for review? |
|
After merging #14785 I will update this PR. There are few changes. |
raunaqmorarka
left a comment
There was a problem hiding this comment.
As a separate commit/PR, we can also update PlanPrinter to print this information for table scans in EXPLAIN ANALYZE VERBOSE
core/trino-main/src/main/java/io/trino/execution/StageStateMachine.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
I'm wondering what do we gain by knowing the distribution of get-splits time per table scan instead of the simpler cumulative time taken for get-splits per table scan ?
One file listing operation might feed multiple batches of splits, so it's possible that the distribution will just capture a few high values and many low values.
There was a problem hiding this comment.
With distribution, we have information about total time needed for getting splits, count, and information if there are some outliers splits.
There was a problem hiding this comment.
My concern is that some get-split batches having an outlier time taken could be normal and then it wouldn't be clear what insight we might get from having a distribution.
Could you check what kind of results we get on an unpartitioned table with large number of files and a partitioned table with lots of partitions ?
core/trino-main/src/main/java/io/trino/execution/TableGetSplitDistribution.java
Outdated
Show resolved
Hide resolved
|
|
||
| long start = System.nanoTime(); | ||
| addSuccessCallback(nextSplitBatchFuture, () -> stageExecution.recordGetSplitTime(start)); | ||
| addSuccessCallback(nextSplitBatchFuture, () -> stageExecution.recordGetSplitTime(partitionedNode, start)); |
There was a problem hiding this comment.
In hive connector, splits are loaded by the connector in a background thread (See BackgroundHiveSplitLoader), so recording time taken here will probably miss the actual work done in listing files by the hive connector.
@sopel39 maybe we need to extend ConnectorSplitSource to allow connector to provide metrics about split generation.
Description
Address issue #13921. Adds split distribution per table and per query.
Example output:
Non-technical explanation
Release notes
( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text: