-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Support retries of streaming sections #13717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
b7810ab to
2acd39f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skimmed over first 4 commits. (Up to "Enable failure detector in TestingPrestoServer")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wow :). Is this the often used pattern in execution? ( looks like built for optimizer~)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is there 6 years ago 😮
|
Some TODOs from in person conversation with @arhimondr (in addition to #13730)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Support retries of streaming sections" .
Skimmed. I need to take a more detailed look when I am clear-headed. But in general looks neat 😄 .
One question is about so many fields added into SqlQueryScheduler. The purpose of them are really to create SqlStageExecution. I am wondering if it makes sense to abstract out something like SqlStageExecutionFactory to take the responsibility for SqlStageExecution creation? This would make a clear separation between:
- Stage schedule (done by
SqlQueryScheduler) - Stage execution creation (done by
SqlStageExecutionFactory, which is a field inSqlQueryScheduler)
Before this PR, we don't separate the stage schedule and stage execution creation . This is kind of OK because stage execution is only created once in the constructor of SqlQueryScheduler. But now since SqlStageExecution will be re-created for retry, so it seems make sense to abstract out the SqlStageExecutionFactory out?
presto-hive/src/test/java/com/facebook/presto/hive/TestHiveRecoverableGroupedExecution.java
Outdated
Show resolved
Hide resolved
presto-hive/src/test/java/com/facebook/presto/hive/TestHiveRecoverableGroupedExecution.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/execution/StageExecutionState.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/execution/scheduler/SqlQueryScheduler.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/execution/scheduler/SqlQueryScheduler.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/execution/scheduler/SqlQueryScheduler.java
Outdated
Show resolved
Hide resolved
|
Thanks @wenleix for the review! I like the idea of a StageExecutionFactory. I wouldn't worry about reviewing this too closely now (though major comments are appreciated), since some details have changed since i've been working on @arhimondr's comments (I haven't updated the PR since I haven't finished yet). Could you review #13730 instead, since I'm building on top of it now. That PR, as requested by andrii, delays all of the stage creation until it's ready to execute, instead of just scheduler creation like I do here. |
2acd39f to
b61d9aa
Compare
8a89604 to
ad5815d
Compare
|
TODO:
|
dc3d63b to
403ad13
Compare
|
I've extracted some things out of |
shixuan-fan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First commit to "Add configuration property max-stage-retries" LGTM
presto-main/src/main/java/com/facebook/presto/SystemSessionProperties.java
Outdated
Show resolved
Hide resolved
shixuan-fan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Remove unused code from TestStageExecutionStateMachine" 👍
"Enable failure detector in tests" 👍
presto-raptor/src/test/java/com/facebook/presto/raptor/RaptorQueryRunner.java
Outdated
Show resolved
Hide resolved
shixuan-fan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Extract SectionStageExecutionFactory from SqlQueryScheduler" LGTM. Didn't verify every line for code moves but it looks good when skimming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally unrelated to this PR. Since createSqlQueryScheduler() is a factory method, and we already passed stateMachine in, this seems kinda redundant.
shixuan-fan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Remove id from StageExecutionInfo"
presto-main/src/main/java/com/facebook/presto/execution/scheduler/SqlQueryScheduler.java
Outdated
Show resolved
Hide resolved
shixuan-fan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Introduce method for empty StageExecutionInfo" and "Reorder fields and methods in SqlQueryScheduler" LGTM
shixuan-fan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Add support for retrying streaming sections" I think it looks correct, but maybe I should pair review with @wenleix to make sure :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should have an enum to indicate if we are creating a query runner for bucket recoverability or stage recoverability. It is not obvious that when materialized is true, we actually turned off bucket recoverability.
presto-main/src/main/java/com/facebook/presto/execution/scheduler/SqlQueryScheduler.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/execution/scheduler/SqlQueryScheduler.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/execution/scheduler/SqlQueryScheduler.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
abort returning false means already aborted does not seem intuitive, and my intuition is that returning false means the abort call failed (I should have commented on the commit when it is introduced). Maybe we should have a isAborted() in SectionExecution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we want the isAborted check and the abort() call to be atomic so that we only increment the retries once. This is the same as the StateMachine transitionToAborted, which returns false if the state was already a done state.
Disabled for raptor because currently they fail with failure-detector enabled and jmx tests because they rely on having a consistent number of nodes.
Make the exponential decay for failure detector, so we can configure it to make the recoverable exocution tests more stable
It's not used, and we don't want to create it when we create empty executionInfo when there haven't been any attempts yet
When we lazily create StageExecutions, we'll need to generate empty stageExecutionInfos for stages that don't have any executions yet
1c6b1df to
854a910
Compare
It definitely makes the test more stable. @arhimondr can you answer the question? |
|
Test was accidentally running with the Legacy scheduler should be fixed now. |
tdcmeehan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All previous commits LGTM ✅
| "SELECT orderkey, day(shipdate) FROM lineitem WHERE orderkey % 31 <> 0 UNION ALL " + | ||
| "SELECT orderkey, day(commitdate) FROM lineitem WHERE orderkey % 31 <> 0 UNION ALL " + | ||
| "SELECT orderkey, day(receiptdate) FROM lineitem WHERE orderkey % 31 <> 0"); | ||
| "SELECT orderkey, day(commitdate) FROM lineitem WHERE orderkey % 31 <> 0 UNION ALL " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super tiny nit, we can remove these formatting-only changes
yes. because with retries we close schedulers of sections that we abort due to failure, so some of the schedulers we close might still be running. Previously we relied on the fact that scheduling would also end whenever there was a failure and everything would get closed then. |
I guess what do reviewers think? Basically the query monitor says that if no sections are in the running state for more than a minute, then we assume something has gone wrong with the scheduler and fail the query. If we think it's okay for production i'll remove the word "hacky" from the commit. And otherwise I'll remove the commit altogether. I didn't see any failures due to this when running in verifier, but I'm not sure if we'll get false positives if there's a full gc or something. |
tdcmeehan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make FixedSourcePartitionedScheduler more thread safe ❓
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we still use the List interface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need volatile here, but you should annotate it with @GuardedBy("this")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... could we do something like just not call .clear() on the list instead? (Why do we clear() it?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the intent is to short circuit the loop where we break if cancelled, isn't that the point of the cancelled flag, and it won't work because the iterator is merrily iterating over the old, un-cleared version of the list. I missed the .remove() below, it makes sense now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be called closed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can simplify everything here by just making cancelled (closed) an AtomicBoolean (to prevent double close), removing the clear() method invocation, and removing the synchronized blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I'm actually not sure why it was being cleared in the first place. My only thought is so that the schedulers get garbage collected. Do you think if we don't clear the list the objects would hang around too long?
Also, we do need to exit the scheduler loop after closing because otherwise we hit "HiveSplitSource is already closed" errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got confused by the indentation in GH, makes sense. I thought the synchronized block didn't include the remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the intent is to short circuit the loop where we break if cancelled, isn't that the point of the cancelled flag, and it won't work because the iterator is merrily iterating over the old, un-cleared version of the list. I missed the .remove() below, it makes sense now.
854a910 to
bfe6728
Compare
|
Speaking of the query monitor, as pointed out by @rschlussel :
It sounds like is has some value. So maybe we can keep it with (another) config property? -- and maybe we can enable it by default on verifier to help catch potential issue, and eventually enabled it in prod? What do you think? @arhimondr , @tdcmeehan , @shixuan-fan ? |
bfe6728 to
5552e2b
Compare
Fix 2 issues related to closing the scheduler during scheduling 1) ConcurrentModificationException for the splitSources iterator 2) HiveSplitSource is already closed error from trying to schedule a HiveSplitSource that's been closed
Organize methods by order they are used in and group fields thematically
Add retriedCpuTime to track the cpu time spent on stages that eventually fail and get retried. This cpu time isn't tracked by the regular cpuTime. Additionally include failed tasks in the retriedCpuTime even though it is also tracked by the total cpu time Co-authored-by: Shixuan Fan <[email protected]>
5552e2b to
26016b6
Compare
arhimondr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % nits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like this class wasn't designed to be multi threaded, and doesn't have to be one.
Instead of closing the scheduler from a different thread in the SectionExecution#abort only the SqlStageExecution has to be transitioned to FAILED there.
Then the same thread that does the scheduling (in SqlQueryScheduler#schedule) can check if the stage if in done state (e.g.: FAILED), and close the scheduler if so.
| worker2.stopResponding(); | ||
|
|
||
| assertEquals(result.get(60, SECONDS).getUpdateCount(), OptionalLong.of(expectedUpdateCount)); | ||
| assertEquals(result.get(1000, SECONDS).getUpdateCount(), OptionalLong.of(expectedUpdateCount)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: It's 15 minutes. Why do we need such a large timeouts? I remember running these tests, and they were finishing withing ~20 seconds. Has it changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hasn't changed. 15 minutes there is unintentional. I can fix it in a follow up.
|
I removed the commit that checks for scheduler bugs and will submit it as a separate PR. @arhimondr I'll address your comments in a follow on pull request, as it requires more testing to ensure that changing when we close the scheduler won't introduce any other problems. |

Addresses #13438