Allow assigning exclusive node for task execution#10432
Allow assigning exclusive node for task execution#10432losipiuk merged 11 commits intotrinodb:masterfrom
Conversation
|
@arhimondr @linzebing give it some initial read when you have chance please. |
afd344b to
c1ddffc
Compare
c1ddffc to
d9dfe98
Compare
core/trino-main/src/main/java/io/trino/memory/QueryContext.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/execution/scheduler/FaultTolerantStageScheduler.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/execution/scheduler/BinPackingNodeAllocatorService.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/execution/scheduler/BinPackingNodeAllocatorService.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/execution/scheduler/BinPackingNodeAllocatorService.java
Outdated
Show resolved
Hide resolved
6e8efd7 to
f42b6cd
Compare
3f66f2e to
667edd4
Compare
|
@arhimondr I added a way to define max number of retries per task + a simple integration test which validates that memory-burst execution actually happens. |
667edd4 to
91a4f4b
Compare
core/trino-main/src/main/java/io/trino/memory/QueryContext.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/memory/QueryContext.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/memory/QueryContext.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/execution/QueryManagerConfig.java
Outdated
Show resolved
Hide resolved
...ino-main/src/main/java/io/trino/execution/scheduler/FullNodeCapableNodeAllocatorService.java
Outdated
Show resolved
Hide resolved
...ino-main/src/main/java/io/trino/execution/scheduler/FullNodeCapableNodeAllocatorService.java
Outdated
Show resolved
Hide resolved
...ino-main/src/main/java/io/trino/execution/scheduler/FullNodeCapableNodeAllocatorService.java
Outdated
Show resolved
Hide resolved
...ino-main/src/main/java/io/trino/execution/scheduler/FullNodeCapableNodeAllocatorService.java
Outdated
Show resolved
Hide resolved
.../trino-main/src/test/java/io/trino/execution/scheduler/TestFullNodeCapableNodeAllocator.java
Outdated
Show resolved
Hide resolved
91a4f4b to
178ab5d
Compare
|
(rebased) |
dc6d4d6 to
00df605
Compare
|
some comments remain to be addressed |
00df605 to
e56c891
Compare
d1ef1a1 to
d6f53b9
Compare
|
Major changes:
@arhimondr PTAL once again |
d6f53b9 to
0cd5d69
Compare
There was a problem hiding this comment.
Should we have a test covering the logic around sharedAllocatedMemory?
There was a problem hiding this comment.
what exactly you think is not covered. The testcases which allocated "shared" note exercise exactly that.
ca0b3b6 to
d7e7c11
Compare
|
Pushed updated version. I needed to move |
d7e7c11 to
d39e8b2
Compare
It turned out we do not need that functionality after all for task-level retries. Removing as currently we do not see the benefit of the mechanism and it increases complexity.
Different implementations of NodeAllocator require different scheme of seleting memory requirements for a partition on retries. Commit introduce PartitionMemoryEstimator interface and two separate implementations. * ConstantPartitionMemoryEstimator to be used with FixedCountNodeAllocator * FallbackToFullNodePartitionMemoryEstimator to be used with FullNodeCapableNodeAllocator
d39e8b2 to
1806e3b
Compare
There was a problem hiding this comment.
The target input size is set to 1GB. I wonder if it is better to set it to something higher, maybe 2 or 3 GB?
There was a problem hiding this comment.
I think default should be fraction on HEAP, not an absolute value. Will change that in a followup.
There was a problem hiding this comment.
Nope - actually tests failed because of that :)
1806e3b to
cebfae3
Compare
|
CI: #5892 |
|
@losipiuk for the release notes text please use markdown syntax for the code highlighting .. so single ` only |
Description
If the task fails due to an out-of-memory error on the next retry Trino will try to allocate the full node for execution.
General information
improvement
core query engine
N/A
Documentation
(x) No documentation is needed. (will be added separately for whole fault-tolerance feature)
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
( ) No release notes entries required.
(x) Release notes entries required with the following suggested text: