Conversation
cee4828 to
d3978a4
Compare
presto-main/src/main/java/com/facebook/presto/execution/TaskManagerConfig.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/execution/executor/TaskExecutor.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/execution/executor/TaskExecutor.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/execution/executor/TaskExecutor.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/execution/executor/TaskExecutor.java
Outdated
Show resolved
Hide resolved
|
I didn't add reviewers to this because it isn't ready. |
|
This does work in practice 20210518_201931_00000_y9r9j |
0235e3f to
4c3cab2
Compare
|
OK, open season! |
e734eff to
4782de6
Compare
There was a problem hiding this comment.
We can remove option in the param and put Option.DEFAULT directly in the body.
presto-main/src/main/java/com/facebook/presto/execution/TaskManagerConfig.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
why not config.getInterruptRunawaySplitsTimeout() directly?
There was a problem hiding this comment.
This is a test constructor and there is no accessible TaskManagerConfig I think? Or you mean instantiate the config and then request the value? I can do that if it is what you are thinking.
There was a problem hiding this comment.
Yes, given this is for test only, we can do that
There was a problem hiding this comment.
Instead of a blacklist, shall we use TaskManager::getAllTaskInfo to decide if a task is still alive and tracked to decide if we want to preempt the current thread?
There was a problem hiding this comment.
If the task is no longer alive then the threads have already been interrupted so there is maybe no point in interrupting them again. So to some extent just making the Joni regex functions interruptible is already a win because it will make query cancellation and timeout work better.
What this brings to the table is that in a specific allow listed cases we will interrupt the splits with a timeout that is different (shorter) from the full query timeout which may be in the many hours range. The idea being this a step towards more automatically and aggressively remediating runaway splits.
4782de6 to
473288a
Compare
There was a problem hiding this comment.
If these defaults are not going to be changed, I guess we don't need to pass them in through parameters? Just directly use them in the body of the class?
There was a problem hiding this comment.
Different ones are passed in for unit tests so the tests can complete quickly.
There was a problem hiding this comment.
Yes, given this is for test only, we can do that
There was a problem hiding this comment.
same here; we could avoid all defaults through parameters
There was a problem hiding this comment.
Same thing RE making the unit test run quickly
presto-main/src/main/java/com/facebook/presto/execution/executor/TaskExecutor.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/execution/executor/TaskExecutor.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/execution/executor/TaskExecutor.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/execution/TaskManagerConfig.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/execution/executor/TaskExecutor.java
Outdated
Show resolved
Hide resolved
473288a to
a607931
Compare
|
@highker I am not 100% clear on what you are asking for with removing default parameters that aren't in |
viczhang861
left a comment
There was a problem hiding this comment.
typo in release note task.task.interrupt-runaway-splits-timeout
presto-main/src/main/java/com/facebook/presto/execution/executor/TaskExecutor.java
Outdated
Show resolved
Hide resolved
fce61da to
ac0e0e7
Compare
|
There is a conflict; maybe rebase and we can merge this PR? |
ac0e0e7 to
e7c97d7
Compare
viczhang861
left a comment
There was a problem hiding this comment.
Check property name in release note.
Summary: OS link for Ref: prestodb#16111 The OS change interrupts long running spits stuck on JoniRegexpFunctions, internally we encounter issue on HashBuilderOperator.buildLookupSource. So added as an additional interrupt condition. Reviewers: #ldap_presto-core, sgurmeet Reviewed By: #ldap_presto-core, sgurmeet Subscribers: sgurmeet, O4263 subscribe to presto changes JIRA Issues: PRESTO-3695 Differential Revision: https://code.uberinternal.com/D6427367
Test plan - Included unit test, deploy to a cluster and test with a known bad query
We do use thread interruption as part of the query cancellation mechanism so it is probably safer then I thought it was in some earlier discussions.
What this is really implementing is eager split termination where we don't honor the user specified query timeout. Because of that I am limiting this to only killing a split that looks like it is blocked in Joni. We can add more if we find other common culprits.
@arhimondr brought up the possibility of edge cases like JDBC where we might have an expensive query that might not yield a page for 10 minutes as it tries to generate a small number of rows from a system that isn't very fast. Even with Hive + HDFS if it is needle in a haystack type stuff this could occur. I think a targeted approach of interrupting allow listed things is a good place to start.