Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@
import io.trino.execution.buffer.BufferResult;
import io.trino.execution.buffer.OutputBuffers;
import io.trino.execution.buffer.PipelinedOutputBuffers;
import io.trino.execution.executor.PrioritizedSplitRunner;
import io.trino.execution.executor.TaskExecutor;
import io.trino.execution.executor.TaskExecutor.RunningSplitInfo;
import io.trino.memory.LocalMemoryManager;
Expand Down Expand Up @@ -705,6 +706,31 @@ private Optional<StuckSplitTasksInterrupter> createStuckSplitTasksInterrupter(
taskExecutor));
}

/**
* The class detects and interrupts runaway splits. It interrupts threads via failing the task that is holding the split
* and relying on {@link PrioritizedSplitRunner#destroy()} method to actually interrupt the responsible thread.
* The detection is invoked periodically with the frequency of {@link StuckSplitTasksInterrupter#stuckSplitsDetectionInterval}.
* A thread gets interrupted once the split processing continues beyond {@link StuckSplitTasksInterrupter#interruptStuckSplitTasksTimeout} and
* the split threaddump matches with {@link StuckSplitTasksInterrupter#stuckSplitStackTracePredicate}. <p>
*
* There is a potential race condition for this {@link StuckSplitTasksInterrupter} class. The problematic flow is that we may
* kill a task that is long-running, but not really stuck on the code that matches {@link StuckSplitTasksInterrupter#stuckSplitStackTracePredicate} (e.g. JONI code).
* Consider the following example:
* <ol>
* <li>We find long-running splits; we get A, B, C.</li>
Comment thread
groupcache4321 marked this conversation as resolved.
Outdated
* <li>None of those is actually running JONI code.</li>
* <li>just before when we investigate stack trace for A, the underlying thread already switched to some other unrelated split D; and D is actually running JONI</li>
* we get the stacktrace for what we believe is A, but it is for D, and we decide we should kill the task that A belongs to</li>
* <li>(clash!!!) wrong decision is made</li>
* </ol>
* A proposed fix and more details of this issue are at: <a href="https://github.com/trinodb/trino/pull/13272">pull/13272</a>.
* We decided not to fix the race condition due to
* <ol>
* <li>its extremely low chance of occurring</li>
* <li>potential low impact if it indeed happened</li>
* <li>extra synchronization complexity the patch would add</li>
* </ol>
*/
private class StuckSplitTasksInterrupter
{
private final Duration interruptStuckSplitTasksTimeout;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -882,6 +882,12 @@ public Set<TaskId> getStuckSplitTaskIds(Duration processingDurationThreshold, Pr
.filter(filter).map(RunningSplitInfo::getTaskId).collect(toImmutableSet());
}

/**
* A class representing a split that is running on the TaskRunner.
* It has a Thread object that gets assigned while assigning the split
* to the taskRunner. However, when the TaskRunner moves to a different split,
* the thread stored here will not remain assigned to this split anymore.
*/
public static class RunningSplitInfo
implements Comparable<RunningSplitInfo>
{
Expand Down