Conversation
@findepi I added a new system which waits for a minimum number of workers after queuing is complete and before execution starts. The min worker count is configured with |
So the semantics is that queries will be queued until there is min number of workers. What if workers leave the cluster after the cluster is started? Will the new queries be on hold too? This is useful in case of downscaling cluster to 0 nodes for idle periods in cloud environment. |
Yes. The new behavior is designed for cloud environments that scale down to zero when there is no traffic and then scale back up when there are queries. BTW, that feature is already checked in, so you can use it now. |
69c5323 to
5bc1cd7
Compare
|
First three commits look good |
|
Can we add a commit prior that adds a specific error code (not generic insufficient resources), so that clients can deal with that situation properly? |
There was a problem hiding this comment.
Remote Optional -> Remove Optional
d461996 to
c512833
Compare
There was a problem hiding this comment.
requireNonNull for queryManagerConfig
There was a problem hiding this comment.
This export has no effect. I looked in JMX and the counter does not appear.
fe558b1 to
fdbc795
Compare
| private final SessionContext sessionContext; | ||
| private final DispatchManager dispatchManager; | ||
| private final QueryId queryId; | ||
| private final String slug = format("%016x%016x", ThreadLocalRandom.current().nextLong(), ThreadLocalRandom.current().nextLong()); |
|
|
||
| QueryInfo getQueryInfo(); | ||
|
|
||
| String getSlug(); |
| return queryId; | ||
| } | ||
|
|
||
| public String getSlug() |
There was a problem hiding this comment.
Following commits LGTM:
Remove system startup minimum worker requirement
Add DISPATCHING query states
Split out queued phase from QueryManager
Add query id to NoSuchElementException
Improve query event stats for immediately failed queries
Remove Optional from QueryStateMachine resourceGroup
Change local dispatch to finish immediately after query submission
There was a problem hiding this comment.
For my own reference: here's the bug from yesterday.
There was a problem hiding this comment.
Nit? Edge case for static import?
raghavsethi
left a comment
There was a problem hiding this comment.
Following commits look good:
Remove Optional from QueryStateMachine resourceGroup
Simplify DispatchInfo construction
Fix handling of failures during query creation
Simplify query manager stats tracking
There was a problem hiding this comment.
Nit: If you named these more specifically (eg queuedDispatchInfo), you could static import.
There was a problem hiding this comment.
I go back and forth on this. In this case I like the FQN.
raghavsethi
left a comment
There was a problem hiding this comment.
Following commits LGTM % nits:
Rename SqlQueryManagerStats to QueryManagerStats
Cleanup dispatcher executor management
There was a problem hiding this comment.
Are we moving to the closer vs the annotation pattern?
There was a problem hiding this comment.
I'm not sure what you mean. This class uses a closer and an @PreDestroy
raghavsethi
left a comment
There was a problem hiding this comment.
Following commits look good:
Fixup! Cleanup dispatcher executor management
Remove bad call to recordHeartbeat in dispatch query
Fix visibility of failed queries in LocalDispatchQuery
Make protocol Query public
Catch errors from LocalDispatchQuery querySubmitter
There was a problem hiding this comment.
Curious: Why this is called LocalDispatchQuery ?
The normal minimum worker requirement applied to all queries is sufficient to cover this case.
A query will be in the DISPATCHING state during handoff to a query execution coordinator.
resourceGroup is already required in QueryStateMachine
querySubmitter should never throw, but if it does fail the query immediately
Previously, the cache was effectively disabled for the first result, so a retry on first request resulted in a 410 gone.
Move queued phase of query from QueryManager to a new dispatcher service. This
change is in preparation for adding a optional new server that moves the queue
phase to a separate process.
Ref prestodb/presto#12176