Add blocked stats for input and output by arhimondr · Pull Request #11625 · trinodb/trino

arhimondr · 2022-03-23T07:27:41Z

Description

Having the blocked time as a top level statistics for input and output of a driver / pipeline / task / query will help with debugging issues related to exchange throughput.

Is this change a fix, improvement, new feature, refactoring, or other?

Improvement

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

Core engine

How would you describe this change to a non-technical end user or system administrator?

-

Related issues, pull requests, and links

-

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

(x) No release notes entries required.
( ) Release notes entries required with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

losipiuk · 2022-03-24T18:15:28Z

core/trino-main/src/main/java/io/trino/sql/planner/planprinter/PlanPrinter.java


            builder.append(indentString(1))
-                    .append(format("CPU: %s, Scheduled: %s, Input: %s (%s); per task: avg.: %s std.dev.: %s, Output: %s (%s)\n",
+                    .append(format("CPU: %s, Scheduled: %s, Blocked %s (Input: %s, Output: %s), Input: %s (%s); per task: avg.: %s std.dev.: %s, Output: %s (%s)\n",


I am a bit concerned if it is readable. We have Input: and Output: in the string twice. But maybe it is fine.

Yeah, I also thought about that :-) Though ran out of ideas how to make it more readable. Basically i was considering a couple of options:

Moving the Blocked section to the end of the line. But then it feels like CPU / Scheduled / Blocked should be next to each other.

Instead of spelling Input: ... Output: ... I was also thinking of simplifying it to (I: ..., O: ...) (since it's rather a "niche" information and one looking is assumed to already understand what they are looking at). But then it felt like it is still better to be more explicit so it is slightly less confusing for somebody who doesn't know the context.

Happy to hear your thoughts

Maybe Blocked total/input/output? But it is even less explicit than I:.., O:....
I think we can keep it. Having this info in single line implies that it will be not super trivial to parse. And I do not think we want to split it to multiple lines (surely not in this PR).

losipiuk · 2022-03-24T18:18:15Z

core/trino-main/src/main/java/io/trino/sql/planner/planprinter/TextRenderer.java

    {
        StringBuilder output = new StringBuilder();
-        if (node.getStats().isEmpty() || !(plan.getTotalCpuTime().isPresent() && plan.getTotalScheduledTime().isPresent())) {
+        if (node.getStats().isEmpty() || !(plan.getTotalCpuTime().isPresent() && plan.getTotalScheduledTime().isPresent() && plan.getTotalBlockedTime().isPresent())) {


is that intentional to not print anything if any of the stats is missing?

Stats are available when

Generating a distributed plan for QueryCompletedEvent

When running EXPLAIN ANALYZE

Stats will be missing when running a simple EXPLAIN (TYPE DISTRIBUTED) ...

pangyifish · 2022-07-26T17:56:53Z

What does "blocked input" and "blocked output" mean exactly? What are the possible reasons for each of them?
I am a bit confused when looking at the query plans. Thanks a lot!

arhimondr · 2022-07-26T18:44:32Z

This is the time a task is blocked reading input data or writing input data (for example data produced by upstream tasks)

pangyifish · 2022-07-26T19:19:26Z

This is the time a task is blocked reading input data or writing input data (for example data produced by upstream tasks)

Thanks! I have a follow-up question:
If there is no upstream, the stage is only "scan filter" a hive table stored in s3, if the blocked input is very low but the blocked output is really high, does that suggest listing objects from s3 is fast but downloading from s3 is very slow?

Thank you!

arhimondr · 2022-07-28T13:24:36Z

The high blocked output value usually indicates that the task is blocked on writing output. In fault tolerant execution it usually indicates efficiency problems at the spooling exchange layer. In pipelines execution it may indicate that upstream stage produces data faster than a downstream stage can process it (e.g.: reading data from S3 is fast, but a following join is slow).

cla-bot bot added the cla-signed label Mar 23, 2022

arhimondr requested review from linzebing and losipiuk March 23, 2022 07:27

arhimondr added 4 commits March 23, 2022 18:55

Add blocked stats for input and output

caa0d96

Include blocked time in PlanNodeStats

a59a522

Reorder fields in PlanRepresentation

a3ba462

Print blocked stats in distributed plan

b9b5285

arhimondr force-pushed the blocked-stats branch from e1deea1 to b9b5285 Compare March 23, 2022 22:55

linzebing approved these changes Mar 24, 2022

View reviewed changes

losipiuk reviewed Mar 24, 2022

View reviewed changes

losipiuk approved these changes Mar 24, 2022

View reviewed changes

arhimondr merged commit fe44c6e into trinodb:master Mar 29, 2022

arhimondr deleted the blocked-stats branch March 29, 2022 16:30

github-actions bot added this to the 376 milestone Mar 29, 2022

mosabua mentioned this pull request Mar 29, 2022

Add Trino 376 release notes #11691

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add blocked stats for input and output#11625

Add blocked stats for input and output#11625
arhimondr merged 4 commits intotrinodb:masterfrom
arhimondr:blocked-stats

arhimondr commented Mar 23, 2022

Uh oh!

losipiuk Mar 24, 2022

Uh oh!

arhimondr Mar 24, 2022 •

edited

Loading

Uh oh!

losipiuk Mar 25, 2022

Uh oh!

losipiuk Mar 24, 2022

Uh oh!

arhimondr Mar 24, 2022

Uh oh!

pangyifish commented Jul 26, 2022

Uh oh!

arhimondr commented Jul 26, 2022

Uh oh!

pangyifish commented Jul 26, 2022 •

edited

Loading

Uh oh!

arhimondr commented Jul 28, 2022

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

Conversation

arhimondr commented Mar 23, 2022

Description

Related issues, pull requests, and links

Documentation

Release notes

Uh oh!

losipiuk Mar 24, 2022

Choose a reason for hiding this comment

Uh oh!

arhimondr Mar 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

losipiuk Mar 25, 2022

Choose a reason for hiding this comment

Uh oh!

losipiuk Mar 24, 2022

Choose a reason for hiding this comment

Uh oh!

arhimondr Mar 24, 2022

Choose a reason for hiding this comment

Uh oh!

pangyifish commented Jul 26, 2022

Uh oh!

arhimondr commented Jul 26, 2022

Uh oh!

pangyifish commented Jul 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arhimondr commented Jul 28, 2022

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

arhimondr Mar 24, 2022 •

edited

Loading

pangyifish commented Jul 26, 2022 •

edited

Loading