Skip to content

Improve QueryInfo and StatementStats performance#17495

Merged
pettyjamesm merged 3 commits intoprestodb:masterfrom
pettyjamesm:improve-query-info
Apr 22, 2022
Merged

Improve QueryInfo and StatementStats performance#17495
pettyjamesm merged 3 commits intoprestodb:masterfrom
pettyjamesm:improve-query-info

Conversation

@pettyjamesm
Copy link
Contributor

Similar changes extracted from trinodb/trino#11580

Refactors logic that creates QueryInfo and StatementStats to reduce overhead and avoid repeated work. This code sits in the critical path of the coordinator producing query result responses.

Each commit has a more specific description of the exact change being made. At a high level, they are:

  • Remove redundant copies and traversals from the conversion process of QueryInfo / StageInfo to StatementStats / StageStats
  • Refactor QueryInfo to consolidate isCompletedInfo() and isFinalInfo() into a single method
  • Reuse the result of StageInfo.getAllStages so that child stages only need to be flattened into a single list once, instead of four times as was the case before this change
== NO RELEASE NOTE ==

@pettyjamesm pettyjamesm force-pushed the improve-query-info branch 2 times, most recently from b77d447 to f6e6968 Compare March 21, 2022 22:07
Avoids extra allocations and copies associated with Immutable
collection building and removes an additional recursive child
stage traversal associated with producing StatementStats and
StageStats in the critical path of producing query results
responses.
Refactors QueryInfo to consolidate usages of isCompleteInfo() and
isFinalInfo() into a single method: isFinalInfo(). The removed
isCompleteInfo field and method lacked a @JsonProperty annotation
and was therefore not being serialized anyway, and the isFinalInfo
method which was annotated as a @JsonProperty for serialization had
no corresponding constructor field and so also would not survive a
serialize / deserialize round-trip and required re-building a list
of StageInfo on each call, which is unnecessary overhead.

The consolidation of the two related fields ensures that the work
of determining whether a QueryInfo is "final" is done once while
producing the QueryInfo object and successfully round-trips when
serialized an deserialized.
Refactors the QueryInfo and QueryStats creation logic to call
getAllStages once, instead of four times as was being done before
this change. Each call to getAllStages will build an ImmutableList
with each StageInfo which can be relatively expensive for queries
that have a large number of stages.

This can be significant, since QueryInfo creation occurs on each
batch of query results fetched through the coordinator.
@pettyjamesm pettyjamesm requested a review from highker April 19, 2022 20:40
@highker highker requested a review from NikhilCollooru April 19, 2022 20:45
@pettyjamesm pettyjamesm merged commit 2e26ea1 into prestodb:master Apr 22, 2022
@pettyjamesm pettyjamesm deleted the improve-query-info branch April 22, 2022 19:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants