Refactor and improve QueryInfo and StageInfo by pettyjamesm · Pull Request #11580 · trinodb/trino

pettyjamesm · 2022-03-19T00:21:31Z

Description

This PR refactors QueryInfo and it's associated creation methods in QueryStateMachine to avoid unnecessary repeated work, remove redundant fields and methods, and reduce memory allocations- since this logic is performed as part of producing each query results response.

Is this change a fix, improvement, new feature, refactoring, or other?

Refactor and reduction of unnecessary overhead in the critical path for sending query result responses.

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

Core engine

How would you describe this change to a non-technical end user or system administrator?

No description of this change to non-technical end users and/or system administrators should be necessary.

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

(x) No release notes entries required.
( ) Release notes entries required with the following suggested text:

findepi · 2022-03-21T12:42:48Z

@pettyjamesm can you please enhance commit messages with some motivation and/or short description?

pettyjamesm · 2022-03-21T17:13:56Z

can you please enhance commit messages with some motivation and/or short description?

Yep, I was in the process of preparing the commits while the PR was still in a draft state. Commit messages have been elaborated upon, some more commits added, and this should now be ready for review.

raunaqmorarka

lgtm % minor comments

core/trino-main/src/main/java/io/trino/execution/QueryInfo.java

raunaqmorarka · 2022-03-23T05:20:06Z

core/trino-main/src/main/java/io/trino/execution/QueryInfo.java

I wonder if scheduled should be kept in QueryInfo and removed from QueryStats as it seems to be about a query's state rather than statistics.
Will let @sopel39 decide this

Yeah, I wasn’t totally sure and it’s easy enough to keep the field in both places (just redundant)

@sopel39 - think you'll have a chance to weigh in on this soon?

I wonder if scheduled should be kept in QueryInfo and removed from QueryStats as it seems to be about a query's state rather than statistics.

makes sense.

BTW: do we need scheduled since we have state in query info?

@raunaqmorarka - I took a look at moving scheduled from QueryStats to QueryInfo but unfortunately it's used from within QueryStats#getProgressPercentage which makes extracting it a harder problem, so I propose we leave it in QueryStats since all QueryInfo instances have a QueryStats value they can consult.

BTW: do we need scheduled since we have state in query info?

@sopel39 : We don't "need" it in the sense that you could re-compute it, but the QueryInfo level notion of being scheduled is not a function of the QueryState but rather is set to true when of the query's stages are either done, pending, or running- so it does have semantic meaning on it's own.

core/trino-main/src/main/java/io/trino/server/protocol/Query.java

raunaqmorarka · 2022-03-23T05:39:13Z

core/trino-main/src/main/java/io/trino/server/protocol/Query.java

Would it be better if we did globalUniqueNodes.addAll(stageUniqueNodes) after this loop ?

There’s no particular advantage to bulk inserts on HashSet, the implementation just does another loop with another iterator which is probably slightly slower (extra iterator, iterator is over the Set instead of a List, temporal memory locality of access, etc)

core/trino-main/src/main/java/io/trino/server/protocol/Query.java

pettyjamesm · 2022-04-01T20:59:09Z

@raunaqmorarka - let me know if you have another idea for how to handle where the field should live based on my earlier comment. I also realized that the web UI requires the "scheduled" flag to be present at both QueryInfo and BasicQueryInfo top level serialized form, so the latest revision outputs the serialized value in both QueryInfo and QueryStats, but the in-memory representation is still consolidated.

pettyjamesm · 2022-04-06T13:22:49Z

@raunaqmorarka / @sopel39 - ready to merge?

raunaqmorarka · 2022-04-07T06:16:45Z

It lgtm, will let @sopel39 decide on merging it

pettyjamesm · 2022-04-19T17:28:48Z

@sopel39 - any update here?

@JsonProperty

Refactors QueryInfo to consolidate usages of isCompleteInfo() and isFinalInfo() into a single method: isFinalInfo. The removed isCompleteInfo field and method lacked a @JsonProperty annotation and was therefore not being serialized anyway, and the isFinalInfo method which was annotated as a @JsonProperty for serialization had no corresponding constructor field and so also would not survive a serialize / deserialize round-trip and required re-building a list of StageInfo on each call, which is unnecessary overhead. The consolidation of the two related fields ensures that the work of determining whether a QueryInfo is "final" is done once while producing the QueryInfo object and successfully round-trips when serialized an deserialized.

StageInfo methods isCompleteInfo() and isFinalInfo() had identical implementations. Since neither method is serialized in the JSON representation, consolidating usage sites to call isFinalInfo() is safe and allows removing the redundant isCompleteInfo() method.

Refactors the QueryInfo and QueryStats creation logic to call getAllStages once, instead of four times as was being done before this change. Each call to getAllStages will build an ImmutableList with each StageInfo which can be relatively expensive for queries that have a large number of stages. This can be significant, since QueryInfo creation occurs on each batch of query results fetched through the coordinator.

QueryInfo and QueryStats both embed the field "scheduled", but QueryInfo embeds QueryStats- making the separate field redundant. With this change, QueryStats will still contain the boolean field and QueryInfo.isScheduled() can defer to the QueryStats value. Unfortunately, the value must still be serialized in the QueryInfo JSON payload since the web UI requires that BasicQueryInfo and QueryInfo both have this field available.

Avoids unnecessary Optional boxing and lambda allocations inside of getAllStages to reduce the overhead associated with flattening sub-stage info into a list.

Avoids extra allocations and copies associated with Immutable collection building removes an additional recursive child stage traversal associated with producing StatementStats and StageStats in the critical path of producing query results responses.

cla-bot bot added the cla-signed label Mar 19, 2022

findepi requested review from kokosing and sopel39 March 21, 2022 12:43

sopel39 requested a review from raunaqmorarka March 21, 2022 12:58

pettyjamesm force-pushed the fix-stage-info branch from a636e86 to 2d82faa Compare March 21, 2022 16:14

pettyjamesm marked this pull request as ready for review March 21, 2022 16:58

pettyjamesm force-pushed the fix-stage-info branch 2 times, most recently from 163f0ab to 52394e1 Compare March 21, 2022 18:19

pettyjamesm mentioned this pull request Mar 21, 2022

Improve QueryInfo and StatementStats performance prestodb/presto#17495

Merged

raunaqmorarka reviewed Mar 23, 2022

View reviewed changes

core/trino-main/src/main/java/io/trino/server/protocol/Query.java Outdated Show resolved Hide resolved

pettyjamesm force-pushed the fix-stage-info branch 6 times, most recently from 6f2cf68 to 4958232 Compare March 28, 2022 15:39

sopel39 requested a review from raunaqmorarka March 29, 2022 10:51

pettyjamesm force-pushed the fix-stage-info branch from 4958232 to ed9e499 Compare April 1, 2022 20:55

raunaqmorarka approved these changes Apr 4, 2022

View reviewed changes

pettyjamesm force-pushed the fix-stage-info branch 4 times, most recently from ae5b937 to c459fa9 Compare April 5, 2022 21:18

pettyjamesm requested a review from arhimondr April 26, 2022 15:09

arhimondr approved these changes Apr 26, 2022

View reviewed changes

pettyjamesm force-pushed the fix-stage-info branch 3 times, most recently from 5e7b3b0 to af229d4 Compare April 27, 2022 17:16

pettyjamesm added 8 commits April 27, 2022 17:04

Reuse getAllStages result in PlanPrinter

8716d83

Cleanup StageInfo.getAllStages

f11f5f5

Avoids unnecessary Optional boxing and lambda allocations inside of getAllStages to reduce the overhead associated with flattening sub-stage info into a list.

Add test for unique node count stats

da3aad6

pettyjamesm force-pushed the fix-stage-info branch from af229d4 to da3aad6 Compare April 27, 2022 21:04

arhimondr merged commit 94072fc into trinodb:master Apr 28, 2022

pettyjamesm deleted the fix-stage-info branch April 28, 2022 13:32

mosabua mentioned this pull request Apr 28, 2022

Add Trino 379 release notes #12106

Merged

Conversation

pettyjamesm commented Mar 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Documentation

Release notes

Uh oh!

findepi commented Mar 21, 2022

Uh oh!

pettyjamesm commented Mar 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

raunaqmorarka left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

raunaqmorarka Mar 23, 2022

Choose a reason for hiding this comment

Uh oh!

pettyjamesm Mar 23, 2022

Choose a reason for hiding this comment

Uh oh!

pettyjamesm Mar 24, 2022

Choose a reason for hiding this comment

Uh oh!

sopel39 Mar 29, 2022

Choose a reason for hiding this comment

Uh oh!

pettyjamesm Mar 29, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

raunaqmorarka Mar 23, 2022

Choose a reason for hiding this comment

Uh oh!

pettyjamesm Mar 23, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pettyjamesm commented Apr 1, 2022

Uh oh!

pettyjamesm commented Apr 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

raunaqmorarka commented Apr 7, 2022

Uh oh!

pettyjamesm commented Apr 19, 2022

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

5 participants

pettyjamesm commented Mar 19, 2022 •

edited

Loading

pettyjamesm commented Mar 21, 2022 •

edited

Loading

pettyjamesm commented Apr 6, 2022 •

edited

Loading