Implement full query retries#9361
Conversation
cb59a15 to
85a65d8
Compare
5073a17 to
26683f4
Compare
26683f4 to
62909ad
Compare
164a687 to
47c3589
Compare
47c3589 to
5f71b9f
Compare
core/trino-main/src/main/java/io/trino/execution/SqlTaskManager.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/execution/StageState.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/execution/TaskManager.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/execution/TaskManager.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/execution/scheduler/FixedCountScheduler.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/test/java/io/trino/operator/TestExchangeClient.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/test/java/io/trino/operator/TestExchangeClient.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/test/java/io/trino/operator/TestExchangeClient.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/execution/scheduler/SqlQueryScheduler.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/execution/scheduler/SqlQueryScheduler.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/execution/SqlStageExecution.java
Outdated
Show resolved
Hide resolved
5f71b9f to
29fd74a
Compare
29fd74a to
21fc3ba
Compare
core/trino-main/src/test/java/io/trino/operator/TestStreamingExchangeClientBuffer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/test/java/io/trino/operator/TestStreamingExchangeClientBuffer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/test/java/io/trino/operator/TestStreamingExchangeClientBuffer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/test/java/io/trino/operator/TestStreamingExchangeClientBuffer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/ExchangeClientBuffer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/StreamingExchangeClientBuffer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/StreamingExchangeClientBuffer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/StreamingExchangeClientBuffer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/StreamingExchangeClientBuffer.java
Outdated
Show resolved
Hide resolved
21fc3ba to
5f2daa5
Compare
| } | ||
|
|
||
| SerializedPage page = pagesIterator.next(); | ||
| pagesIterator.remove(); |
There was a problem hiding this comment.
how costly is this one? Given you are using ArrayListMultimap it feels like it may be O(sth^2). Hard to figure out from the code though.
There was a problem hiding this comment.
Yeah, you are right, not sure what I was thinking :-) Let me switch it to LinkedListMultimap for now. It could be optimized further though. However I'm not sure if it makes sense to overspend time on this implementation as with adding spilling capabilities it may change.
core/trino-main/src/main/java/io/trino/operator/DeduplicationExchangeClientBuffer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/DeduplicationExchangeClientBuffer.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Assuming the query state remains RUNNING during this delay period, is there some easy way to tell from the web ui that the query is waiting for a retry and not just stuck due to some issue ?
There was a problem hiding this comment.
During the delay between attempts all stages will be in a "PENDING" state. This indicates that no active tasks are currently running in a stage. We have a separate work item to update UI to make it easier to identify failures and retries.
core/trino-main/src/main/java/io/trino/execution/scheduler/SqlQueryScheduler.java
Outdated
Show resolved
Hide resolved
9bb0034 to
ffb1d69
Compare
|
Updated |
ffb1d69 to
9cb85d7
Compare
|
Just a rebase |
9cb85d7 to
9fe5d6a
Compare
There was a problem hiding this comment.
Can we have this class extend AbstractTestQueryFramework ?
Although I've made the changes for that in arhimondr@eca47a9 , it would be nice to have it in this commit itself
There was a problem hiding this comment.
That's a good idea. However I would prefer to keep the abstract QueryRunner createQueryRunner(List<TpchTable<?>> requiredTpchTables, Map<String, String> configProperties, Map<String, String> coordinatorProperties) method. It makes it more clear at the interface level what tables must be provisioned and what configuration must be set when creating a query runner.
Extract buffering related code under an interface. It will unblock introducing different buffering strategies without a need to make schanges to the ExchangeClient itself
This is a preparation for introducing task level retries. This PR removes "all-or-nothing" execution assumption from SqlStageExecution. Scheduling related assumptions of pipelined execution are now moved to the PipelinedStageExecution. SqlStageExecution is now merely a container for scheduled tasks for a particular stage. It's main responsibility now is to keep track of running tasks and collect their execution statistics.
Move responsibility of opening a split source from planning phase to execution phase to allow reopening split source on query retries
9fe5d6a to
c250bc0
Compare
No description provided.