Use lock to ensure consistency during runaway split detection#13272
Use lock to ensure consistency during runaway split detection#13272groupcache4321 wants to merge 1 commit intotrinodb:masterfrom
Conversation
|
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: yluan.
|
1be5561 to
036ac0a
Compare
036ac0a to
2ce7947
Compare
|
Same purpose as: #13262 |
|
Want to make sure this is only case we are avoiding: A threadA processing Joni for 10mins+ and finished per part of processing (split A) then move to other (split B)(so both splitA and splitB's RunningSplitInfo hold the same reference of threadA). At the meantime, the interrupter thread check the stacktrace of JONI(from splitA) and then failed the SplitB. |
losipiuk
left a comment
There was a problem hiding this comment.
This is tricky to follow but seems to work fine. I would suggest to sprinkle it with some comments to make life of future readers easier.
@leetcode-1533 Not really. The scenario you are describing I think is not possible right now. With the current code, we would always kill The problematic flow is that we may kill a task that is long-running, but not really stuck on JONI code. Consider the following example:
I think you solution here prevents that from happening. |
Make sense, since we are failing the task. And we are finding the thread from the driver's context. |
core/trino-main/src/main/java/io/trino/execution/executor/TaskExecutor.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/execution/executor/TaskExecutor.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
RunningSplitInfo currently has this state "Thread" which can change what it's doing through a mechanism external to the class. Now the addition of taskRunner provides better (and sufficient) encapsulation here, so IMO we should remove the direct Thread access from here, and instead let all usecases go through to taskRunner, unless there's a particular reason.
There was a problem hiding this comment.
in fact, may be better to make this check while getting the stacktrace itself. i.e. only give a valid stacktrace if the the runner is indeed processing the split's code.
There was a problem hiding this comment.
Hey, taskRunner is a runnable and don't the thread object saved. Are you suggesting getting the currentThread from TaskRunner?
There was a problem hiding this comment.
For the taskRunner, we still can't guarantee it is always doing processing for the Split.
There was a problem hiding this comment.
we do need the check to make sure that the taskRunner is processing the same split, that this PR is doing. my original suggestion was around simplifying it by not having to store Thread. But I tried it out, and seems that we can't get away without having to stash it somewhere.
The current code that gets the stacktrace is potentially getting an "incorrect" stacktrace. And this PR is avoiding this mistake later through a callback when trying to kill the task. I was trying to think if we can stop making this mistake in the first place, by making this check when you return the stacktrace (check whether we're still on the same split). In that case, you wouldn't need to pass in the callback, but simply fail the task like your previous code was doing.
RunningSplitInfo {
......
public Optional<Thread> getThreadStackTrace()
{
return taskRunner.getThreadStackTrace(this, thread);
}
}
TaskRunner {
........
synchronized StackTrace isStillExecuting(RunningSplitInfo stuckSplitInfo, Thread thread) {
if (stuckSplitInfo != null && currentThreadSplitInfo == stuckSplitInfo) {
return thread.getStackTrace();
}
}
}
I guess the question is, is there an advantage to executing the failTask within the TaskRunner lock or just getting the right stacktrace is enough? (Haven't looked into whether failTask may be locking something else that'd potentially cause a deadlock. cc @losipiuk @arhimondr)
Even though we have to store the thread, if possible, we can just expose the needed stacktrace from the splitinfo and leverage taskRunner for locking. However, having gone through the exercise, I don't feel super strongly about readability in one or the other (advtg will be that we avoid the callback)
There was a problem hiding this comment.
"stop making this mistake in the first place,": there is always a performance trade off when moving holding locks early. Essentially, it means we increased the impacted threads, adding more works in the critical section, added more delay/works for more accurate result. This code intentionally delay the lock, since JONI stuck is really a corner case.
What's more, when dealing with stacktrace, there is always a window of race condition. Since we can't freeze the thread trace
Deadlock means two threads hold each other's locks, Since currentRunningSplitInfo is newly added. I don't think failTask will acquire currentRunningSplitInfo and causing deadlock. Especially in the all 3 places that acquired currentRunningSplitInfo, the code didn't ask for more locks.
There was a problem hiding this comment.
Probability of a split running for 10-15 minutes switching to a different split at the exact same moment when the checker is running is practically "non existent". The consequences of this event are nowhere near significant. The worst thing that could happen is that a task running a stuck split that is not uninterruptible will get killed. IMO we are trying to solve a small problem that could happen with an extremely tiny probability by introducing extra synchronization complexity that may backfire in a significantly more impactful way. It looks like killing a task under a lock is probably safe today (I'm not 100% sure though due to complexity of task management code), but it may backfire if the task execution internals ever change (e.g.: when failing a task is changed to be synchronous vs asynchronous). If it was up to me - I would probably suggest to leave it as is and instead of trying to do our best on interrupting only tasks that stuck on Joni try to make other long running operations interruptible (can be done iteratively and over time) and eventually remove the predicate.
There was a problem hiding this comment.
IMO we are trying to solve a small problem that could happen with an extremely tiny probability by introducing extra synchronization complexity that may backfire in a significantly more impactful way. It looks like killing a task under a lock is probably safe today (I’m not 100% sure though due to complexity of task management code), but it may backfire if the task execution internals ever change (e.g.: when failing a task is changed to be synchronous vs asynchronous)
+1. I do see value in fixing it, but worried from future changes/maintainability perspective. One option is that we can leave a really extensive comment about this potential corner case, and leave it at that. OR, instead of failing the task inside the lock - just log warning inside the lock, but fail outside of the lock.
core/trino-main/src/main/java/io/trino/execution/executor/TaskExecutor.java
Outdated
Show resolved
Hide resolved
|
hey, I was able to reproduce @losipiuk described race condition in a unit test, will update later. |
|
Current code logic is that: all checks: delay, stacktrace are not threadsafe. |
To ensure the consistency between 1. the split and associate the interrupt thread is failing. 2. and the thread stack trace that the interrupt thread inspected.
2ce7947 to
b795c9a
Compare
|
Had a discussion with @arhimondr , @losipiuk and @leetcode-1533. We decided on leaving a comment explaining the potential race condition, but not actually fix it. The reason being (1) super low likelihood of getting into the problematic scenario and (2) the complexity and synchronization that we introduce for this small corner case may backfire with the changes in task execution and management framework. |
Improve detection of runaway splits and related task killing code to
ensure that we do not kill a thread which we suppose hung, but moved to
execute on behalf of another query, just before we issue kill command.
Description
Bugfix (for a very low probablity race condition)
Related issues, pull requests, and links
Improvement on top of #12392
Documentation
(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
(x) No release notes entries required.
( ) Release notes entries required with the following suggested text: