Skip to content

Add blocked stats for input and output#11625

Merged
arhimondr merged 4 commits intotrinodb:masterfrom
arhimondr:blocked-stats
Mar 29, 2022
Merged

Add blocked stats for input and output#11625
arhimondr merged 4 commits intotrinodb:masterfrom
arhimondr:blocked-stats

Conversation

@arhimondr
Copy link
Contributor

Description

Having the blocked time as a top level statistics for input and output of a driver / pipeline / task / query will help with debugging issues related to exchange throughput.

Is this change a fix, improvement, new feature, refactoring, or other?

Improvement

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

Core engine

How would you describe this change to a non-technical end user or system administrator?

-

Related issues, pull requests, and links

-

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

(x) No release notes entries required.
( ) Release notes entries required with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Mar 23, 2022
@arhimondr arhimondr requested review from linzebing and losipiuk March 23, 2022 07:27

builder.append(indentString(1))
.append(format("CPU: %s, Scheduled: %s, Input: %s (%s); per task: avg.: %s std.dev.: %s, Output: %s (%s)\n",
.append(format("CPU: %s, Scheduled: %s, Blocked %s (Input: %s, Output: %s), Input: %s (%s); per task: avg.: %s std.dev.: %s, Output: %s (%s)\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit concerned if it is readable. We have Input: and Output: in the string twice. But maybe it is fine.

Copy link
Contributor Author

@arhimondr arhimondr Mar 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I also thought about that :-) Though ran out of ideas how to make it more readable. Basically i was considering a couple of options:

  • Moving the Blocked section to the end of the line. But then it feels like CPU / Scheduled / Blocked should be next to each other.
  • Instead of spelling Input: ... Output: ... I was also thinking of simplifying it to (I: ..., O: ...) (since it's rather a "niche" information and one looking is assumed to already understand what they are looking at). But then it felt like it is still better to be more explicit so it is slightly less confusing for somebody who doesn't know the context.

Happy to hear your thoughts

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe Blocked total/input/output? But it is even less explicit than I:.., O:....
I think we can keep it. Having this info in single line implies that it will be not super trivial to parse. And I do not think we want to split it to multiple lines (surely not in this PR).

{
StringBuilder output = new StringBuilder();
if (node.getStats().isEmpty() || !(plan.getTotalCpuTime().isPresent() && plan.getTotalScheduledTime().isPresent())) {
if (node.getStats().isEmpty() || !(plan.getTotalCpuTime().isPresent() && plan.getTotalScheduledTime().isPresent() && plan.getTotalBlockedTime().isPresent())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that intentional to not print anything if any of the stats is missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stats are available when

  1. Generating a distributed plan for QueryCompletedEvent
  2. When running EXPLAIN ANALYZE

Stats will be missing when running a simple EXPLAIN (TYPE DISTRIBUTED) ...

@arhimondr arhimondr merged commit fe44c6e into trinodb:master Mar 29, 2022
@arhimondr arhimondr deleted the blocked-stats branch March 29, 2022 16:30
@github-actions github-actions bot added this to the 376 milestone Mar 29, 2022
@pangyifish
Copy link

What does "blocked input" and "blocked output" mean exactly? What are the possible reasons for each of them?
I am a bit confused when looking at the query plans. Thanks a lot!

@arhimondr
Copy link
Contributor Author

This is the time a task is blocked reading input data or writing input data (for example data produced by upstream tasks)

@pangyifish
Copy link

pangyifish commented Jul 26, 2022

This is the time a task is blocked reading input data or writing input data (for example data produced by upstream tasks)

Thanks! I have a follow-up question:
If there is no upstream, the stage is only "scan filter" a hive table stored in s3, if the blocked input is very low but the blocked output is really high, does that suggest listing objects from s3 is fast but downloading from s3 is very slow?

Thank you!

@arhimondr
Copy link
Contributor Author

The high blocked output value usually indicates that the task is blocked on writing output. In fault tolerant execution it usually indicates efficiency problems at the spooling exchange layer. In pipelines execution it may indicate that upstream stage produces data faster than a downstream stage can process it (e.g.: reading data from S3 is fast, but a following join is slow).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants