-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Transition to executing when begin planning #15872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transition to executing when begin planning #15872
Conversation
|
@JunhyungSong thanks for a very nice PR description, this is very useful!
Callers should not infer any particular meaning from the paths used in nextUri. Callers (client tools, UIs), should inspect the query state, so we should make sure the query state returned as "PLANNING" if the query is in the planning state.
What is the effect of this (from the query protocol perspective)? |
701c9c5 to
96f938d
Compare
This PR is for that. Since callers should not infer
This will mark |
fd02941 to
1c74619
Compare
that's sounds like something observable from end user perspective. So, are you saying that queries in PLANNING state are not viewed as such by the client tools? |
Correct.
Can you recommend tests that I can refer to? |
|
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
|
@pettyjamesm would you also be able to review this one or approve if you've already done a review internally please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A thrown RejectedExecutionException will now trigger query failure instead of a stuck query in the previous implementation- which is good and probably something we could test for. Also note that a failure in minimumWorkerFuture could also have RejectedExecutionException thrown from queryExecutor and should probably just call stateMachine.transitionToFailed(throwable) inline instead of queueing it through queryExecutor.
The above was incorrect, queryExecutor is a cached threadpool executor and will not throw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talked offline. DispatchExecutor will not throw RejectedExecutionException unless the server is shutting down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good with this formulation of submitting a new future to handle queryExecution.start(), but would it be equivalent to call submitted.set(null); queryExecution.start(); here without a shift to another thread in the same threadpool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think closeExchangeIfNecessary(QueryInfo) needs to check queryInfo.getState().ordinal() > STARTING logic too, otherwise we could close the exchange before planning completes if that were to be called from another code path.
After that change, we should leave the closeExchangeIfNecessary and removePagesFromExchange as-is without needing to check isStarted at this point explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As chatted separately, removePagesFromExchange needs the condition check as well since it needs to return null for columns and types when the query is not started, yet. And the other code path of calling closeExchangeIfNecessary(QueryInfo) already checks a query completion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say that relying on two separate code-paths to check the same condition before calling closeExchangeIfNecessary(QueryInfo) is riskier than just having closeExchangeIfNecessary do the check internally (more robust than refactoring).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the condition here at https://github.com/trinodb/trino/blob/master/core/trino-main/src/main/java/io/trino/server/protocol/Query.java#L198 is a little different. And there will be three separate code-paths to check the same condition if we move the condition check. The first is inside closeExchangeIfNecessary. The second is inside removePagesFromExchange. And the third is for if (isStarted && (queryInfo.getOutputStage().isEmpty() || exchangeDataSource.isFinished())).
c699593 to
42ee128
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good with this formulation of submitting a new future to handle queryExecution.start(), but would it be equivalent to call submitted.set(null); queryExecution.start(); here without a shift to another thread in the same threadpool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say that relying on two separate code-paths to check the same condition before calling closeExchangeIfNecessary(QueryInfo) is riskier than just having closeExchangeIfNecessary do the check internally (more robust than refactoring).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot of ceremony going on here to override the whole ServerMainModule when we could just pass TestingQueryExecution directly into a LocalDispatchQuery constructor instead and assert off of that, which would be much simpler to verify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we change TestingQueryExecution to use a CountdownLatch inside of QueryExecution#start() then we can just await that countdown instead of having to do a sleep-wait loop here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change can be avoided (since it's only to make something overrideable in tests) by just passing the TestQueryExecution directly to the LocalDispatchQuery and skipping all the dependency injection ceremony.
core/trino-main/src/main/java/io/trino/dispatcher/LocalDispatchQuery.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/execution/SqlQueryExecution.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/dispatcher/LocalDispatchQuery.java
Outdated
Show resolved
Hide resolved
3501f2d to
58d3546
Compare
58d3546 to
2efb1b8
Compare
|
The test failures seem irrelevant. |
dain
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the Transition to executing when begin planning commit, it is really important for me to be able to understand the diff if the commit contains only the changes absolutely necessary for the desired change to the state transitions (the part of the code base is really complex). The other minor changes can be in commits before of after that one.
core/trino-main/src/main/java/io/trino/dispatcher/LocalDispatchQuery.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/dispatcher/LocalDispatchQuery.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/dispatcher/LocalDispatchQuery.java
Outdated
Show resolved
Hide resolved
2efb1b8 to
6d0cef0
Compare
|
The test failure seems irrelevant. |
dain
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still having some trouble understanding some of the additional changes in this. Also, I think the main commit can be simplified by rewriting the test.
core/trino-main/src/main/java/io/trino/dispatcher/DispatchManager.java
Outdated
Show resolved
Hide resolved
testing/trino-tests/src/test/java/io/trino/tests/TestLocalDispatchQuery.java
Outdated
Show resolved
Hide resolved
testing/trino-tests/src/test/java/io/trino/tests/TestLocalDispatchQuery.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/server/protocol/Query.java
Outdated
Show resolved
Hide resolved
6d0cef0 to
be238c9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't use abbreviations like this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not be making this field mutable for a test. I'll explain how in the test code below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is using TpchQueryRunner, which internally uses DistributedQueryRunner and you should just use that directly. Once you do that, you can add an additional Guice module to the runner. In the module you can register a new DDL task for some mocked up statement (see QueryExecutionFactoryModule). Then submit the fake statement directly using LocalDispatchQuery (which you can get from the injector the coordinator).
Alternatively, you can just create a LocalDispatchQuery directly and test the state changes you expect are happening on the state machine transitions.
be238c9 to
d42639d
Compare
d42639d to
58acd00
Compare
|
The failure looks irrelevant. |
Description
Currently,
/v1/statement/queued/{queryId}/{slug}/{token}get request returns queued query state even if the query is already planning. It will not be redirected to/v1/statement/executing/{queryId}/{slug}/{token}get request until all planning tasks including plan distribution is completed. When planning takes long time or fails during planning, it can mislead callers because of this. So, it needs to setsubmittedwhen transitioning toPLANNING.Additional context and related issues
Release notes
(v) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text: