Skip to content

Resource Management related cleanup#14104

Merged
wenleix merged 18 commits intoprestodb:masterfrom
tdcmeehan:resourcem
Mar 11, 2020
Merged

Resource Management related cleanup#14104
wenleix merged 18 commits intoprestodb:masterfrom
tdcmeehan:resourcem

Conversation

@tdcmeehan
Copy link
Copy Markdown
Contributor

@tdcmeehan tdcmeehan commented Feb 14, 2020

Rebase of #12801

== NO RELEASE NOTE ==

@tdcmeehan tdcmeehan mentioned this pull request Feb 14, 2020
@tdcmeehan tdcmeehan force-pushed the resourcem branch 3 times, most recently from d5e7a3e to 2079065 Compare February 17, 2020 03:11
@tdcmeehan tdcmeehan changed the title [WIP] Resource Management related cleanup Resource Management related cleanup Feb 27, 2020
@tdcmeehan tdcmeehan requested a review from swapsmagic February 27, 2020 22:07
Copy link
Copy Markdown
Contributor

@wenleix wenleix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Add query id to NoSuchElementException".

LGTM by comparing with be6f54d

Copy link
Copy Markdown
Contributor

@wenleix wenleix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Remove system startup minimum worker requirement" % one small difference after comparing with bfa2524

@@ -174,6 +171,8 @@ public SqlQueryManager(

this.clusterSizeMonitor = requireNonNull(clusterSizeMonitor, "clusterSizeMonitor is null");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note this line get removed in bfa2524 :)

Copy link
Copy Markdown
Contributor

@wenleix wenleix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Add DISPATCHING query states". LGTM compared with 270504e

Copy link
Copy Markdown
Contributor

@wenleix wenleix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Split out queued phase from QueryManager" LGTM compared with b846616

Note the removal of DistributedQueryRunner#getQueryInfo doesn't seem to be included in this PR?
(b846616#diff-179c18855b05981dabfa4d6e458f8695L418)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just double check: now slug is provided as a parameter, while in the old PR, slug is part of the path? (See b846616#diff-ef10213de38c18dc49c5303d7eea77a4R234)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I saw from our commit history we did this because our HTTP logging seems to exclude query params from the logs.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm? :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not just a rebase conflict, looks like all references to unboundedExecutorService (was: queryExecutor) should have been removed entirely.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't see "slugValid" related method in this commit? (see b846616#diff-d65ad4547dcbc09fc621ac2ceddce6cdR217)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because we cherry picked this earlier in #13110

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change this to nextToken? :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was supposed to happen in a later commit. Fixed it.

Copy link
Copy Markdown
Contributor

@wenleix wenleix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Improve query event stats for immediately failed queries". LGTM (ref 5841142)

Copy link
Copy Markdown
Contributor

@wenleix wenleix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Remove Optional from QueryStateMachine resourceGroup". LGTM (ref e7989ed)

Copy link
Copy Markdown
Contributor

@wenleix wenleix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Simplify DispatchInfo construction" . LGTM. (ref 6872c94)

Copy link
Copy Markdown
Member

@arhimondr arhimondr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of abstractions that makes the change a little bit hard to follow, but generally looks good. It looks like the queries map in the ExecutingStatementResource is not being cleaned. It should be fixed to avoid memory leaks.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed? Why not simply Futures.nonCancellationPropagating()?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out it's a different semantic. An attempt to cancel this future has no effect, whereas cancelling the returned future from nonCancellationPropagating may be cancelled (however this will not be propagated to the inner future).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since PurgeQueriesRunnable is gone who's responsible for cleaning up the queries map?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about calling it notFound? badRequest implies 400 status.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'll do is, because every method that calls this passes in NOT_FOUND for the status, is I'll hardcode that NOT_FOUND and rename to notFound.

Copy link
Copy Markdown
Contributor

@wenleix wenleix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Fix handling of failures during query creation": LGTM (ref 3561452)

Copy link
Copy Markdown
Contributor

@wenleix wenleix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Simplify query manager stats tracking" LGTM (ref 9d53fc7)

Copy link
Copy Markdown
Contributor

@wenleix wenleix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Rename SqlQueryManagerStats to QueryManagerStats": LGTM. (ref a9a527c)

Copy link
Copy Markdown
Contributor

@wenleix wenleix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Add LocalCoordinatorLocation". LGTM (ref 1568cfa)

Copy link
Copy Markdown
Contributor

@wenleix wenleix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Add peak tasks to BasicQueryStats". LGTM. Do you need to add operator allocation now as well? (added in #14191)

Copy link
Copy Markdown
Contributor

@wenleix wenleix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Change local dispatch to finish immediately after query submission" . LGTM (ref ecea8fc)

@tdcmeehan tdcmeehan requested a review from cemcayiroglu March 6, 2020 00:44
Copy link
Copy Markdown
Contributor

@wenleix wenleix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Cleanup dispatcher executor management". LGTM (ref cdbe6a1). We use a BoundExecutor which seems to make sense.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm. 😃

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to also export the number of active threads in BoundedExecutor? -- this has been an operation pain when we want to monitor when BoundedExecutor is full.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BoundedExecutor does not expose the number of active threads, however the underlying executor itself does export itself as an MBean via ThreadPoolExecutorMBean

Copy link
Copy Markdown
Contributor

@wenleix wenleix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Catch errors from LocalDispatchQuery querySubmitter": LGTM (ref e5b414a)

Copy link
Copy Markdown
Contributor

@wenleix wenleix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last two commits LGTM:

  • "Fix result caching in protocol Query"
  • "Simplify token management in protocol Query "

@tdcmeehan
Copy link
Copy Markdown
Contributor Author

A lot of abstractions that makes the change a little bit hard to follow, but generally looks good. It looks like the queries map in the ExecutingStatementResource is not being cleaned. It should be fixed to avoid memory leaks.

This is a really great catch. Thank you.

tdcmeehan and others added 2 commits March 6, 2020 15:33
Co-authored-by: Dain Sundstrom <dain@iq80.com>
Co-authored-by: Raghav Sethi <raghavsethi.rs@gmail.com>
The normal minimum worker requirement applied to all queries is
sufficient to cover this case.

Co-authored-by: Dain Sundstrom <dain@iq80.com>
Co-authored-by: Raghav Sethi <raghavsethi.rs@gmail.com>
@tdcmeehan tdcmeehan force-pushed the resourcem branch 6 times, most recently from 0262287 to 6a2c70c Compare March 8, 2020 07:05
tdcmeehan and others added 15 commits March 8, 2020 22:25
Co-authored-by: Dain Sundstrom <dain@iq80.com>
Co-authored-by: Raghav Sethi <raghavsethi.rs@gmail.com>
Co-authored-by: Dain Sundstrom <dain@iq80.com>
Co-authored-by: Raghav Sethi <raghavsethi.rs@gmail.com>
Co-authored-by: Dain Sundstrom <dain@iq80.com>
Co-authored-by: Raghav Sethi <raghavsethi.rs@gmail.com>
resourceGroup is already required in QueryStateMachine

Co-authored-by: Dain Sundstrom <dain@iq80.com>
Co-authored-by: Raghav Sethi <raghavsethi.rs@gmail.com>
Co-authored-by: Dain Sundstrom <dain@iq80.com>
Co-authored-by: Raghav Sethi <raghavsethi.rs@gmail.com>
Co-authored-by: Dain Sundstrom <dain@iq80.com>
Co-authored-by: Raghav Sethi <raghavsethi.rs@gmail.com>
Co-authored-by: Dain Sundstrom <dain@iq80.com>
Co-authored-by: Raghav Sethi <raghavsethi.rs@gmail.com>
Co-authored-by: Dain Sundstrom <dain@iq80.com>
Co-authored-by: Raghav Sethi <raghavsethi.rs@gmail.com>
Co-authored-by: Dain Sundstrom <dain@iq80.com>
Co-authored-by: Raghav Sethi <raghavsethi.rs@gmail.com>
Co-authored-by: Dain Sundstrom <dain@iq80.com>
Co-authored-by: Raghav Sethi <raghavsethi.rs@gmail.com>
querySubmitter should never throw, but if it does fail the query immediately

Co-authored-by: Dain Sundstrom <dain@iq80.com>
Co-authored-by: Raghav Sethi <raghavsethi.rs@gmail.com>
Previously, the cache was effectively disabled for the first result, so a
retry on first request resulted in a 410 gone.

Co-authored-by: Dain Sundstrom <dain@iq80.com>
Co-authored-by: Raghav Sethi <raghavsethi.rs@gmail.com>
Co-authored-by: Dain Sundstrom <dain@iq80.com>
Co-authored-by: Raghav Sethi <raghavsethi.rs@gmail.com>
Co-authored-by: Dain Sundstrom <dain@iq80.com>
Copy link
Copy Markdown
Member

@arhimondr arhimondr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wenleix wenleix merged commit 7bfc203 into prestodb:master Mar 11, 2020
@wenleix
Copy link
Copy Markdown
Contributor

wenleix commented Mar 11, 2020

Merged! Thanks for the contribution! 😃 😃 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants