Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -735,7 +735,14 @@ private void failStuckSplitTasks()
log.warn("%s is long running with stackTrace:\n%s", splitInfo.getSplitInfo(), stackTraceElements.stream().map(Object::toString).collect(joining(lineSeparator())));
}

return stuckSplitStackTracePredicate.test(stackTraceElements);
boolean isThreadStack = stuckSplitStackTracePredicate.test(stackTraceElements);

// We check if thread's name matches threadId stored in RunningSplitInfo to be sure that stacktrace we obtained belongs to the execution of split
// described by RunningSplitInfo.
// There is still a chance that we may observe the stacktrace from execution of new split before thread name is set in io.trino.execution.executor.TaskExecutor.TaskRunner.run()
// Yet, we assume that such stacktrace would not be classified as "stack" by stuckSplitStackTracePredicate.
boolean splitAssignmentDidNotChange = splitInfo.getThread().getName().startsWith(splitInfo.getThreadId());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relying on thread names is very brittle. There's no contract for thread names, no guarantee they are going to be the same or different for a given set of conditions, etc.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are other possible ways to do this. Some random ideas:

  • Create our own version of Thread that contains additional information that we can inspect
  • Drive the process of finding stuck threads from the list of all running splits instead of trying to infer the split by inspecting the threads
  • Maintain an external mapping of split <-> thread

Also, for any approach that relies on looking at the thread and then checking if it corresponds to the split we want to kill, there will always be a small window for a race condition unless we force proper synchronization of these checks with split-to-thread assignments.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I agree this is somewhat brittle. For my defence - there is test which (at least partially) verifies it works ;).
The simple change to that, keeping the same solution principle, would be to have thread <-> split-execution-id mapping, where split-execution-id would be incremented any time split assigned to a thread changes.

And yeah - there is a race-condition window still - I mention that in the comment.

return isThreadStack && splitAssignmentDidNotChange;
});

for (TaskId stuckSplitTaskId : stuckSplitTaskIds) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -479,7 +479,7 @@ public void run()
return;
}

String threadId = split.getTaskHandle().getTaskId() + "-" + split.getSplitId();
String threadId = split.getTaskHandle().getTaskId() + "-" + split.getSplitId() + "-" + System.nanoTime();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may negatively impact tools that gather thread dumps and attempt to group threads by name for analysis purposes.

try (SetThreadName splitName = new SetThreadName(threadId)) {
RunningSplitInfo splitInfo = new RunningSplitInfo(ticker.read(), threadId, Thread.currentThread(), split);
runningSplitInfos.add(splitInfo);
Expand Down