Transition to executing when begin planning #15872

JunhyungSong · 2023-01-26T23:00:37Z

Description

Currently, /v1/statement/queued/{queryId}/{slug}/{token} get request returns queued query state even if the query is already planning. It will not be redirected to /v1/statement/executing/{queryId}/{slug}/{token} get request until all planning tasks including plan distribution is completed. When planning takes long time or fails during planning, it can mislead callers because of this. So, it needs to set submitted when transitioning to PLANNING.

Additional context and related issues

Release notes

(v) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

findepi · 2023-01-27T10:31:17Z

@JunhyungSong thanks for a very nice PR description, this is very useful!

Currently, /v1/statement/queued/{queryId}/{slug}/{token} get request returns queued query state even if the query is already planning. It will not be redirected to /v1/statement/executing/{queryId}/{slug}/{token} get request until all planning tasks including plan distribution is completed. When planning takes long time or fails during planning, it can mislead callers because of this.

Callers should not infer any particular meaning from the paths used in nextUri.

Callers (client tools, UIs), should inspect the query state, so we should make sure the query state returned as "PLANNING" if the query is in the planning state.

So, it needs to set submitted when transitioning to PLANNING.

What is the effect of this (from the query protocol perspective)?

JunhyungSong · 2023-01-30T21:22:07Z

Callers (client tools, UIs), should inspect the query state, so we should make sure the query state returned as "PLANNING" if the query is in the planning state.

This PR is for that. Since callers should not infer nextUri as you mentioned, the engine needs to provide a right nextUri within the response for the caller's requests.

What is the effect of this (from the query protocol perspective)?

This will mark dispatched as true. And it will set coordinator location and change the nextUri to ExecutingStatementResource instead of QueuedStatementResource. With that, ExecutingStatementResource will return QueryResult with a correct state(e.g. PLANNING or STARTING instead of QUEUED).

findepi · 2023-01-31T10:20:15Z

will return QueryResult with a correct state(e.g. PLANNING or STARTING instead of QUEUED).

that's sounds like something observable from end user perspective.

So, are you saying that queries in PLANNING state are not viewed as such by the client tools?
would it be possible to write a test for that?
for example with a mock connector you can insert a pause in some metadata method, allowing planning to take as much time as needed for test purposes.

JunhyungSong · 2023-02-01T05:44:33Z

So, are you saying that queries in PLANNING state are not viewed as such by the client tools?

Correct.

would it be possible to write a test for that?
for example with a mock connector you can insert a pause in some metadata method, allowing planning to take as much time as needed for test purposes.

Can you recommend tests that I can refer to?

findepi · 2023-02-01T11:46:19Z

I do not. @electrum and @dain will know better for protocol-level changes and tests

JunhyungSong · 2023-02-01T19:54:56Z

@electrum @dain do you have any ideas on how to add tests for this?

github-actions · 2023-02-23T21:53:11Z

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

bitsondatadev · 2023-02-23T21:54:37Z

@pettyjamesm would you also be able to review this one or approve if you've already done a review internally please?

pettyjamesm · 2023-02-27T17:03:22Z

core/trino-main/src/main/java/io/trino/dispatcher/LocalDispatchQuery.java

A thrown RejectedExecutionException will now trigger query failure instead of a stuck query in the previous implementation- which is good and probably something we could test for. Also note that a failure in minimumWorkerFuture could also have RejectedExecutionException thrown from queryExecutor and should probably just call stateMachine.transitionToFailed(throwable) inline instead of queueing it through queryExecutor.

The above was incorrect, queryExecutor is a cached threadpool executor and will not throw.

Talked offline. DispatchExecutor will not throw RejectedExecutionException unless the server is shutting down.

I'm good with this formulation of submitting a new future to handle queryExecution.start(), but would it be equivalent to call submitted.set(null); queryExecution.start(); here without a shift to another thread in the same threadpool?

Makes sense.

pettyjamesm · 2023-02-27T17:07:49Z

core/trino-main/src/main/java/io/trino/server/protocol/Query.java

I think closeExchangeIfNecessary(QueryInfo) needs to check queryInfo.getState().ordinal() > STARTING logic too, otherwise we could close the exchange before planning completes if that were to be called from another code path.

After that change, we should leave the closeExchangeIfNecessary and removePagesFromExchange as-is without needing to check isStarted at this point explicitly.

As chatted separately, removePagesFromExchange needs the condition check as well since it needs to return null for columns and types when the query is not started, yet. And the other code path of calling closeExchangeIfNecessary(QueryInfo) already checks a query completion.

I would say that relying on two separate code-paths to check the same condition before calling closeExchangeIfNecessary(QueryInfo) is riskier than just having closeExchangeIfNecessary do the check internally (more robust than refactoring).

Actually the condition here at https://github.com/trinodb/trino/blob/master/core/trino-main/src/main/java/io/trino/server/protocol/Query.java#L198 is a little different. And there will be three separate code-paths to check the same condition if we move the condition check. The first is inside closeExchangeIfNecessary. The second is inside removePagesFromExchange. And the third is for if (isStarted && (queryInfo.getOutputStage().isEmpty() || exchangeDataSource.isFinished())).

pettyjamesm · 2023-03-22T18:53:51Z

core/trino-main/src/main/java/io/trino/dispatcher/LocalDispatchQuery.java

I'm good with this formulation of submitting a new future to handle queryExecution.start(), but would it be equivalent to call submitted.set(null); queryExecution.start(); here without a shift to another thread in the same threadpool?

pettyjamesm · 2023-03-22T19:21:25Z

core/trino-main/src/main/java/io/trino/server/protocol/Query.java

I would say that relying on two separate code-paths to check the same condition before calling closeExchangeIfNecessary(QueryInfo) is riskier than just having closeExchangeIfNecessary do the check internally (more robust than refactoring).

pettyjamesm · 2023-03-22T19:23:08Z

testing/trino-tests/src/test/java/io/trino/tests/TestLocalDispatchQuery.java

There's a lot of ceremony going on here to override the whole ServerMainModule when we could just pass TestingQueryExecution directly into a LocalDispatchQuery constructor instead and assert off of that, which would be much simpler to verify.

pettyjamesm · 2023-03-22T19:24:05Z

testing/trino-tests/src/test/java/io/trino/tests/TestLocalDispatchQuery.java

If we change TestingQueryExecution to use a CountdownLatch inside of QueryExecution#start() then we can just await that countdown instead of having to do a sleep-wait loop here.

pettyjamesm · 2023-03-22T19:25:03Z

core/trino-main/src/main/java/io/trino/dispatcher/LocalDispatchQueryFactory.java

This change can be avoided (since it's only to make something overrideable in tests) by just passing the TestQueryExecution directly to the LocalDispatchQuery and skipping all the dependency injection ceremony.

core/trino-main/src/main/java/io/trino/dispatcher/LocalDispatchQuery.java

core/trino-main/src/main/java/io/trino/execution/SqlQueryExecution.java

core/trino-main/src/main/java/io/trino/dispatcher/LocalDispatchQuery.java

JunhyungSong · 2023-04-05T05:17:03Z

The test failures seem irrelevant.

dain

For the Transition to executing when begin planning commit, it is really important for me to be able to understand the diff if the commit contains only the changes absolutely necessary for the desired change to the state transitions (the part of the code base is really complex). The other minor changes can be in commits before of after that one.

core/trino-main/src/main/java/io/trino/dispatcher/LocalDispatchQuery.java

JunhyungSong · 2023-04-18T01:49:56Z

The test failure seems irrelevant.

dain

I'm still having some trouble understanding some of the additional changes in this. Also, I think the main commit can be simplified by rewriting the test.

core/trino-main/src/main/java/io/trino/dispatcher/DispatchManager.java

testing/trino-tests/src/test/java/io/trino/tests/TestLocalDispatchQuery.java

core/trino-main/src/main/java/io/trino/server/protocol/Query.java

dain · 2023-04-26T16:13:20Z

testing/trino-tests/src/test/java/io/trino/tests/TestLocalDispatchQuery.java

We don't use abbreviations like this

dain · 2023-04-26T16:15:16Z

core/trino-main/src/main/java/io/trino/dispatcher/DispatchManager.java

We should not be making this field mutable for a test. I'll explain how in the test code below.

dain · 2023-04-26T16:23:36Z

testing/trino-tests/src/test/java/io/trino/tests/TestLocalDispatchQuery.java

This test is using TpchQueryRunner, which internally uses DistributedQueryRunner and you should just use that directly. Once you do that, you can add an additional Guice module to the runner. In the module you can register a new DDL task for some mocked up statement (see QueryExecutionFactoryModule). Then submit the fake statement directly using LocalDispatchQuery (which you can get from the injector the coordinator).

Alternatively, you can just create a LocalDispatchQuery directly and test the state changes you expect are happening on the state machine transitions.

JunhyungSong · 2023-05-02T18:31:38Z

The failure looks irrelevant.

cla-bot bot added the cla-signed label Jan 26, 2023

JunhyungSong requested a review from findepi January 26, 2023 23:00

martint requested a review from dain January 27, 2023 00:40

findepi requested review from arhimondr, losipiuk and sopel39 January 27, 2023 10:30

JunhyungSong force-pushed the transition-to-executing-when-planning branch from 701c9c5 to 96f938d Compare January 30, 2023 10:55

JunhyungSong force-pushed the transition-to-executing-when-planning branch 2 times, most recently from fd02941 to 1c74619 Compare January 31, 2023 01:42

JunhyungSong requested a review from electrum February 1, 2023 19:55

github-actions bot added the stale label Feb 23, 2023

bitsondatadev requested a review from pettyjamesm February 23, 2023 22:28

github-actions bot removed the stale label Feb 25, 2023

sopel39 removed their request for review February 27, 2023 14:23

pettyjamesm reviewed Feb 27, 2023

View reviewed changes

kokosing force-pushed the master branch from 3f05134 to 58d6356 Compare March 14, 2023 11:34

JunhyungSong force-pushed the transition-to-executing-when-planning branch 2 times, most recently from c699593 to 42ee128 Compare March 18, 2023 00:26

pettyjamesm reviewed Mar 22, 2023

View reviewed changes

dain reviewed Mar 28, 2023

View reviewed changes

JunhyungSong force-pushed the transition-to-executing-when-planning branch 2 times, most recently from 3501f2d to 58d3546 Compare April 4, 2023 15:42

JunhyungSong force-pushed the transition-to-executing-when-planning branch from 58d3546 to 2efb1b8 Compare April 5, 2023 02:13

dain reviewed Apr 12, 2023

View reviewed changes

JunhyungSong force-pushed the transition-to-executing-when-planning branch from 2efb1b8 to 6d0cef0 Compare April 14, 2023 19:08

dain reviewed Apr 19, 2023

View reviewed changes

JunhyungSong force-pushed the transition-to-executing-when-planning branch from 6d0cef0 to be238c9 Compare April 26, 2023 07:45

dain reviewed Apr 26, 2023

View reviewed changes

Transition to executing when begin planning

881c92f

JunhyungSong force-pushed the transition-to-executing-when-planning branch from be238c9 to d42639d Compare May 2, 2023 09:37

JunhyungSong added 2 commits May 2, 2023 09:56

Test if a query is submitted after transitiong to planning

78eee57

Use explicit query executor to start and fail execution

58acd00

JunhyungSong force-pushed the transition-to-executing-when-planning branch from d42639d to 58acd00 Compare May 2, 2023 09:58

dain approved these changes May 2, 2023

View reviewed changes

dain merged commit ff0e23a into trinodb:master May 2, 2023

github-actions bot added this to the 416 milestone May 2, 2023

JunhyungSong deleted the transition-to-executing-when-planning branch May 2, 2023 22:06

colebow mentioned this pull request May 3, 2023

Add Trino 416 release notes #17328

Merged

Transition to executing when begin planning #15872

Transition to executing when begin planning #15872

Uh oh!

Conversation

JunhyungSong commented Jan 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Additional context and related issues

Release notes

Uh oh!

findepi commented Jan 27, 2023

Uh oh!

JunhyungSong commented Jan 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

findepi commented Jan 31, 2023

Uh oh!

JunhyungSong commented Feb 1, 2023

Uh oh!

findepi commented Feb 1, 2023

Uh oh!

JunhyungSong commented Feb 1, 2023

Uh oh!

github-actions bot commented Feb 23, 2023

Uh oh!

bitsondatadev commented Feb 23, 2023

Uh oh!

pettyjamesm Feb 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JunhyungSong commented Apr 5, 2023

Uh oh!

dain left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JunhyungSong commented Apr 18, 2023

Uh oh!

dain left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

JunhyungSong commented Jan 26, 2023 •

edited

Loading

JunhyungSong commented Jan 30, 2023 •

edited

Loading

pettyjamesm Feb 27, 2023 •

edited

Loading