[SPARK-19276][CORE] Fetch Failure handling robust to user error handling #16639

squito · 2017-01-19T06:31:12Z

What changes were proposed in this pull request?

Fault-tolerance in spark requires special handling of shuffle fetch
failures. The Executor would catch FetchFailedException and send a
special msg back to the driver.

However, intervening user code could intercept that exception, and wrap
it with something else. This even happens in SparkSQL. So rather than
checking the thrown exception only, we'll store the fetch failure directly
in the TaskContext, where users can't touch it.

How was this patch tested?

Added a test case which failed before the fix. Full test suite via jenkins.

Fault-tolerance in spark requires special handling of shuffle fetch failures. The Executor would catch FetchFailedException and send a special msg back to the driver. However, intervening user code could intercept that exception, and wrap it with something else. This even happens in SparkSQL. So rather than checking the exception directly, we'll store the fetch failure directly in the TaskContext, where users can't touch it. This includes a test case which failed before the fix.

SparkQA · 2017-01-19T06:49:24Z

Test build #71636 has finished for PR 16639 at commit 0091aba.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

squito · 2017-01-19T06:55:34Z

cc @kayousterhout @markhamstra @mateiz

This isn't just protecting against crazy user code -- I've seen users hit this with spark sql (because of

spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala

Line 214 in 278fa1e

throw new SparkException("Task failed while writing rows", t)

), so it seems important to fix.

I attempted to write a larger integration test, which reproduced the issue in a "local-cluster" setup, but got stuck. ShuffleBlockFetcherIterator does some fetches on construction, before its used as an iterator wrapped in user code. So if the failures happen during that initialization, everything was fine before. The failure has to happen inside the call to shuffleBlockFetcherIterator.next() when its called by the user's iterator for the error to happen. I eventually was able to reproduce it with this squito@c2d27d1 but it involved hacking internals and didn't seem easy to get into a test. I settled for a simpler unit test just on Executor, but open to more suggestions.

SparkQA · 2017-01-19T06:56:15Z

Test build #71637 has finished for PR 16639 at commit 5c28b62.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-01-19T07:05:36Z

Test build #71638 has finished for PR 16639 at commit b93c37f.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-01-19T07:21:32Z

Test build #71640 has finished for PR 16639 at commit 9635980.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

mridulm

Thanks for the patch @squito, fixing this should definitely make spark more robust.

mridulm · 2017-01-19T08:57:17Z

core/src/main/scala/org/apache/spark/shuffle/FetchFailedException.scala

+  // which intercepts this exception (possibly wrapping it), the Executor can still tell there was
+  // a fetch failure, and send the correct error msg back to the driver.  The TaskContext won't be
+  // defined if this is run on the driver (just in test cases) -- we can safely ignore then.
+  Option(TaskContext.get()).map(_.setFetchFailed(this))


Since creation of an Exception does not necessarily mean it should get thrown - we must explicitly add this expectation to the documentation/contract of FetchFailedException constructor - indicating that we expect it to be created only for it to be thrown immediately.
This should be fine since FetchFailedException is private[spark] right now.

yes, good point. I added to the docs, does it look OK?

I also considered making the call to TaskContext.setFetchFailed live outside of the constructor, so at each site it was created, it would have to be called -- but I thought that seemed more dangerous.

mridulm · 2017-01-19T08:59:08Z

core/src/main/scala/org/apache/spark/TaskContextImpl.scala

  // Whether the task has failed.
  @volatile private var failed: Boolean = false

+  var fetchFailed: Option[FetchFailedException] = None


@volatile private ?

mridulm · 2017-01-19T09:01:29Z

core/src/main/scala/org/apache/spark/executor/Executor.scala

+          // and threw something else.  Regardless, we treat it as a fetch failure.
+          val reason = task.context.fetchFailed.get.toTaskFailedReason
+          setTaskFinishedAndClearInterruptStatus()
+          execBackend.statusUpdate(taskId, TaskState.FAILED, ser.serialize(reason))


nit: Probably log a similar message as above ?

do you mean the msg I added about "TID ${taskId} completed successfully though internally it encountered unrecoverable fetch failures!"? I wouldn't think we'd want to log anything special here. I'm trying to make this a "normal" code path. The user is allowed to allowed to do this. (sparksql already does.)

we could log a warning, but then this change should be accompanied by auditing the code and making sure we never do this ourselves.

Yes, something along those lines ...
And I agree, we should not be doing this ourselves as well.

mridulm · 2017-01-19T09:10:10Z

core/src/main/scala/org/apache/spark/executor/Executor.scala

          setTaskFinishedAndClearInterruptStatus()
          execBackend.statusUpdate(taskId, TaskState.FAILED, ser.serialize(reason))

+        case t: Throwable if task.context.fetchFailed.isDefined =>


task and task.context can be null in case exception is thrown before/while deserializing task or before task is run (or initialization of context in task.run fails).
In any of these cases, the if condition here will result in NPE, and needs to be fixed.

oh, great point! sorry I missed that. I've also added a test case for this as well.

squito · 2017-01-19T17:20:53Z

thanks for the feedback @mridulm , all good points. I pushed an update to address some of the points, also have some follow up discussion

SparkQA · 2017-01-19T19:21:58Z

Test build #71669 has finished for PR 16639 at commit 730fd83.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class SparkListenerExecutorBlacklisted(
case class SparkListenerExecutorUnblacklisted(time: Long, executorId: String)
case class SparkListenerNodeBlacklisted(
case class SparkListenerNodeUnblacklisted(time: Long, hostId: String)
case class QualifiedTableName(database: String, name: String)
class FindHiveSerdeTable(session: SparkSession) extends Rule[LogicalPlan]

SparkQA · 2017-01-19T19:25:44Z

Test build #71668 has finished for PR 16639 at commit bbef893.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

SparkQA · 2017-01-19T22:25:21Z

Test build #71673 has finished for PR 16639 at commit 4494673.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

vanzin · 2017-01-20T18:35:47Z

core/src/main/scala/org/apache/spark/TaskContextImpl.scala

  }

+  private[spark] override def setFetchFailed(fetchFailed: FetchFailedException): Unit = {
+    this._fetchFailed = Some(fetchFailed)


minor: Option(fetchFailed)

vanzin · 2017-01-20T18:39:18Z

core/src/test/scala/org/apache/spark/executor/ExecutorSuite.scala

+
+    val serTask = serializer.serialize(task)
+    val taskDescription = fakeTaskDescription(serTask)
+


nit: too many empty lines

vanzin · 2017-01-20T18:41:47Z

core/src/test/scala/org/apache/spark/executor/ExecutorSuite.scala

+      executor.launchTask(mockBackend, taskDescription)
+      val startTime = System.currentTimeMillis()
+      val maxTime = startTime + 5000
+      while (executor.numRunningTasks > 0 && System.currentTimeMillis() < maxTime) {


I'd use eventually here, or at least System.nanoTime instead.

kayousterhout

I did a super quick review of this and have some high level thoughts (will do a more thorough review pending your thoughts on the things below):

(1) Instead of this approach, did you consider walking through the exceptions (with getCause()) to see if there's a nested FetchFailure in there? That seems simpler, with the con of missing scenarios where the user discards the initial exception entirely. Not sure how likely that is? The current approach is definitely more defensive towards bad user code, but I'm hesitant about the amount of added complexity.

(2) For testing, does it help to pass in a much smaller maxBytesToFetch (spark.reduce.maxSizeInFlight) to ShuffleBlockFetcherIterator to limit the size of the initial fetches, to make it easier to wrap the FetchFailed when you want to?

kayousterhout · 2017-02-07T21:33:57Z

core/src/main/scala/org/apache/spark/TaskContextImpl.scala


+  // If there was a fetch failure in the task, we store it here, to make sure user-code doesn't
+  // hide the exception.  See SPARK-19276
+  @volatile private var _fetchFailed: Option[FetchFailedException] = None


minor: can you more verbosely call this _fetchFailedException, so it's more obvious that it's not a boolean variable (like the above failed variable)

kayousterhout · 2017-02-07T21:36:10Z

core/src/main/scala/org/apache/spark/executor/Executor.scala

          setTaskFinishedAndClearInterruptStatus()
          execBackend.statusUpdate(taskId, TaskState.FAILED, ser.serialize(reason))

+        case t: Throwable if hasFetchFailure =>


Can the above case be eliminated with the addition of this one?

yeah, as this is now, you could eliminate this -- I left it separate for now just to highlight that we can differentiate two special cases, which we could handle in a few different ways.

FetchFailed is thrown, and the task fails, but its not the outer-most exception

It seems clear in this case, we should fail the task with a FetchFailure. But do we also want to log an error or something indicating bad user code? Kinda minor, but might be a good idea. (Suggested by @mridulm above as well, I think.)

1a) or the FetchFailed isn't part of the thrown exception at all.

As I mentioned in my response to your other question, I'd like to consider this exactly the same as (1).

FetchFailed is thrown, but totally swallowed so the task succeeds

Should we succeed the task, or fail it? I don't really know how this would happen. It seems really unlikely the user meant to do this. But then again, maybe the user did? I chose to just log an error but still succeed the task. (@markhamstra commented about this on the jira as well.)

its pretty easy to change the code for whatever the desired behavior is, just waiting for a clear decision.

I agree with Mridul's comment on (1) (that it would be nice to log a warning in this case) and your assessment of 2. To handle (1), you could have just this one case, and then log a warning if !t.isInstanceOf[FetchFailedException]

kayousterhout · 2017-02-07T21:37:11Z

core/src/main/scala/org/apache/spark/scheduler/Task.scala

          memoryManager.synchronized { memoryManager.notifyAll() }
        }
      } finally {
+        // though we unset the ThreadLocal here, the context itself is still queried directly


nit: "the context member variable" instead of just "the context" (took me a min to parse this)

squito · 2017-02-07T22:39:44Z

(1) Instead of this approach, did you consider walking through the exceptions (with getCause()) to see if there's a nested FetchFailure in there? That seems simpler, with the con of missing scenarios where the user discards the initial exception entirely. Not sure how likely that is? The current approach is definitely more defensive towards bad user code, but I'm hesitant about the amount of added complexity.

yeah, I considered it but chose this for exactly the reason you suggest, to be more defensive. It just seems way too easy for a user to get this wrong. Even assuming the user always wraps the exception, what about bad error handling in a finally? That is pretty common, even in spark, eg.:

try { 
  ...
} catch {
  case t: Throwable => throw new MySpecialAppException(t)
} finally {
  someResource.close() // oops, this can throw an exception
}

Given the special importance of this exception, it really seems like we should be handling it in some way that bad user code can't cover it up.

(2) For testing, does it help to pass in a much smaller maxBytesToFetch (spark.reduce.maxSizeInFlight) to ShuffleBlockFetcherIterator to limit the size of the initial fetches, to make it easier to wrap the FetchFailed when you want to?

Do you mean for writing a larger integration test? that would probably help make the failure more likely, but I off the top my head, I don't think that will be enough to reliable trigger the real failure (which means if its ever broken, we'll just think its a flaky test, not broken functionality). IIRC the problem isn't just having more fetches than fit in the initial requests -- you also need the fetch failure to occur at a specific point in the processing. I'd need to futz around with it a while again to say for sure, though.

I'd certainly like an integration test, but eventually decided the unit test I added was sufficient.

mridulm · 2017-02-07T22:47:48Z

Walking up a getCause tree is not reliable - finally is one of the cases where it will fail (others being catch block's ignoring it, catch-rethrow idioms resulting in other exceptions being thrown, etc).
A cleaner solution would be great, but I was not able to come up with one - using a ThreadLocal variable would be another option instead of instance variable in TaskContext; but that would be nitpicking ...

kayousterhout · 2017-02-09T20:49:47Z

Ok I'm convinced re: not walking up the cause tree. I didn't think about that finally case. I'll do another review now.

Re: larger integration test, I didn't have a particular thing in mind -- I was mentioning that in response to your comment at it was hard to trigger the failure because of that first set of blocks that's fetched before any next() calls (so I was hoping you had some integration test in mind that would leverage that). But fine not to have one if it's still too complex to do.

kayousterhout

The approach here looks good. I was hoping there might be a way to simplify but I couldn't think of anything (and the commenting is helpful for noting why stuff needed to be added)

kayousterhout · 2017-02-09T20:55:59Z

core/src/main/scala/org/apache/spark/executor/Executor.scala

+          // unlikely).  So we will log an error and keep going.
+          logError(s"TID ${taskId} completed successfully though internally it encountered " +
+            s"unrecoverable fetch failures!  Most likely this means user code is incorrectly " +
+            s"swallowing Spark's internal exceptions", fetchFailure)


Can you be explicit about what exception is getting swallowed here? (i.e., "incorrectly swallowing Spark's internal FetchFailedException") -- to possibly simplify debugging/fixing this issue for a user who runs into it.

kayousterhout · 2017-02-09T20:57:34Z

core/src/main/scala/org/apache/spark/executor/Executor.scala

          setTaskFinishedAndClearInterruptStatus()
          execBackend.statusUpdate(taskId, TaskState.FAILED, ser.serialize(reason))

+        case t: Throwable if hasFetchFailure =>


I agree with Mridul's comment on (1) (that it would be nice to log a warning in this case) and your assessment of 2. To handle (1), you could have just this one case, and then log a warning if !t.isInstanceOf[FetchFailedException]

kayousterhout · 2017-02-09T21:10:15Z

core/src/main/scala/org/apache/spark/executor/Executor.scala

+        case t: Throwable if hasFetchFailure =>
+          // tbere was a fetch failure in the task, but some user code wrapped that exception
+          // and threw something else.  Regardless, we treat it as a fetch failure.
+          val reason = task.context.fetchFailed.get.toTaskFailedReason


tiny nit: but does it make sense to store the taskFailedReason (rather than the actual exception) in the task context?

kayousterhout · 2017-02-09T21:12:52Z

core/src/test/scala/org/apache/spark/executor/ExecutorSuite.scala

    }
  }
+
+  test("SPARK-19276: Handle Fetch Failed for all intervening user code") {


how about "Handle FetchFailedExceptions that are hidden by user exceptions"?

kayousterhout · 2017-02-09T21:15:54Z

core/src/test/scala/org/apache/spark/executor/ExecutorSuite.scala

+    val taskDescription = fakeTaskDescription(serTask)
+
+
+    val failReason = runTaskAndGetFailReason(taskDescription)


can you add a comment about what's going on here? I think the FFE gets thrown because the shuffle map data was never generated? And then you're checking that it's correctly accounted for, even though the user RDD code wrapped the exception in something else?

kayousterhout · 2017-02-09T21:16:32Z

core/src/test/scala/org/apache/spark/executor/ExecutorSuite.scala

+    assert(failReason.isInstanceOf[FetchFailed])
+  }
+
+  test("Gracefully handle error in task deserialization") {


is this test related to this PR? (seems useful but like it should be in its own PR?)

@mridulm pointed out this bug in an earlier version of this pr, so I fixed the bug and added a test case. But in any case, I've separated this out into #16930 / https://issues.apache.org/jira/browse/SPARK-19597

kayousterhout · 2017-02-09T21:17:01Z

core/src/test/scala/org/apache/spark/executor/ExecutorSuite.scala

+  }
+}
+
+class FakeShuffleRDD(sc: SparkContext) extends RDD[Int](sc, Nil) {


about about FetchFailureThrowingShuffleRDD? (to make it obvious what the point of this is?)

kayousterhout · 2017-02-09T21:18:24Z

core/src/test/scala/org/apache/spark/executor/ExecutorSuite.scala

+    }
+  }
+
+  private def mockEnv(conf: SparkConf, serializer: JavaSerializer): SparkEnv = {


can this and the below method have a verb in the name, since they're doing something rather than just getters? createMockEnv?

squito · 2017-02-14T17:22:40Z

core/src/main/scala/org/apache/spark/executor/Executor.scala

+            logWarning(s"TID ${taskId} encountered a ${classOf[FetchFailedException]} and " +
+              s"failed, but did not directly throw the ${classOf[FetchFailedException]}.  " +
+              s"Spark is still handling the fetch failure, but these exceptions should not be " +
+              s"intercepted by user code.")


@mridulm @kayousterhout how is this msg? open to other suggestions. I'm not sure exactly what to recommend to the user instead.

I worry that this is slightly misleading because there's not necessarily anything bad happening here (e.g., in the SQL case), and the user-thrown exception is getting permanently lost. What about something more like

logWarning(s"TID ${taskId} encountered a ${classOf[FetchFailedException]} and " + s"failed, but the ${classOf[FetchFailedException]} was hidden by another " + s"exception: $t. Spark is handling this like a fetch failure and ignoring the " + s"other exception.")

@kayousterhout While I like the message, spark sql should not be catching that exception to begin with anyway.

Btw, the impact of ignoring the exception here is needs to be also considered ... "catch Throwable" block does some interesting things for accumulator updates, isFatalError.
Atleast the latter must be handled here (an OOM being raised for example) - not sure about accumulator updates ...

I agree with @mridulm that it looks like these lines (473-475 below) need to be added here:

if (Utils.isFatalError(t)) { SparkUncaughtExceptionHandler.uncaughtException(t) }

I'm less sure about the accumulator updates. It looks like the old code doesn't report accumulators for fetch failed exceptions, but it's not clear to me why we'd report them for some kinds of exceptions but not others. The simplest thing to do seems to be the current approach (since it roughly maintains the old behavior of not updating accumulators for fetch failures) but I don't have a good sense for why this is or is not correct.

Thanks, I like that msg better. I changed it slightly so the original exception is at the end, otherwise its hard to tell where the original exception ends and you are back to the error msg. Here's what the new msg looks like from the test case now:

17/02/27 16:33:43.953 Executor task launch worker for task 0 WARN Executor: TID 0 encountered a org.apache.spark.shuffle.FetchFailedException and failed, but the org.apache.spark.shuffle.FetchFailedException was hidden by another exception. Spark is handling this like a fetch failure and ignoring the other exception: java.lang.RuntimeException: User Exception that hides the original exception

You have a good point about the uncaught exception handler, I have added that back. I wondered whether I should add those lines inside the case t: Throwable if hasFetchFailure block, or make it a condition for the case itself case t: Throwable if hasFetchFailure && !Utils.isFatalError(t). I decided to make it part of the condition, since that is more like the old behavior, and a fetch failure that happens during an OOM may not be real.

I also looked into adding a unit test for this handling -- it requires some refactoring, potentially more work than its worth, so I put it in a separate commit.

I'd rather avoid changing the behavior for accumulators here. Accumulators have such weird semantics its not clear what they should do, we can fix that separately if we really want to.

SparkQA · 2017-02-14T20:51:58Z

Test build #72883 has finished for PR 16639 at commit 7840480.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-02-14T21:28:47Z

Test build #72882 has finished for PR 16639 at commit 08491c5.

This patch fails from timeout after a configured wait of `250m`.
This patch does not merge cleanly.
This patch adds no public classes.

squito · 2017-02-14T22:15:23Z

Jenkins, retest this please

SparkQA · 2017-02-15T00:59:07Z

Test build #72900 has finished for PR 16639 at commit 7840480.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

kayousterhout

A few last things. Also looks like this duplicates some of the functionality in #16930, which can be fixed when that's merged

kayousterhout · 2017-02-24T19:29:47Z

core/src/main/scala/org/apache/spark/executor/Executor.scala

+            logWarning(s"TID ${taskId} encountered a ${classOf[FetchFailedException]} and " +
+              s"failed, but did not directly throw the ${classOf[FetchFailedException]}.  " +
+              s"Spark is still handling the fetch failure, but these exceptions should not be " +
+              s"intercepted by user code.")


I worry that this is slightly misleading because there's not necessarily anything bad happening here (e.g., in the SQL case), and the user-thrown exception is getting permanently lost. What about something more like

logWarning(s"TID ${taskId} encountered a ${classOf[FetchFailedException]} and " + s"failed, but the ${classOf[FetchFailedException]} was hidden by another " + s"exception: $t. Spark is handling this like a fetch failure and ignoring the " + s"other exception.")

kayousterhout · 2017-02-24T19:32:55Z

core/src/main/scala/org/apache/spark/shuffle/FetchFailedException.scala

+  // SPARK-19276. We set the fetch failure in the task context, so that even if there is user-code
+  // which intercepts this exception (possibly wrapping it), the Executor can still tell there was
+  // a fetch failure, and send the correct error msg back to the driver.  The TaskContext won't be
+  // defined if this is run on the driver (just in test cases) -- we can safely ignore then.


This last sentence is confusing. A task that runs locally on the driver can still hit fetch failures right? Or are you saying the TaskContext will only be not defined in test cases?

sorry, I've reworded this. The issue is that we have test cases where the TaskContext isn't defined, and so we'd hit an NPE without the Option wrapper. But in general, the TaskContext should always be defined anytime we'd create a FetchFailure.

The alternative would be to track down the test cases w/out a TaskContext, and add one back.

SparkQA · 2017-02-28T19:13:11Z

Test build #73598 has finished for PR 16639 at commit ad47611.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

kayousterhout · 2017-02-28T19:51:46Z

Jenkins retest this please (filed https://issues.apache.org/jira/browse/SPARK-19772)

SparkQA · 2017-02-28T19:52:55Z

Test build #73597 has finished for PR 16639 at commit bee5621.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

kayousterhout

LGTM -- thanks for adding the test for fatal errors

SparkQA · 2017-02-28T22:26:57Z

Test build #73607 has finished for PR 16639 at commit ad47611.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

squito · 2017-03-02T17:48:48Z

@mridulm look ok to you too? I plan on merging soon.

I just made a small change to the comments (I copied and pasted incorrect comments in the last test case I added)

mridulm · 2017-03-02T18:07:27Z

@squito It looks good to me, thanks for the changes !

SparkQA · 2017-03-02T20:22:43Z

Test build #73784 has finished for PR 16639 at commit 965506a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

kayousterhout · 2017-03-03T00:46:53Z

I merged this into master. Thanks @squito!

Fault-tolerance in spark requires special handling of shuffle fetch failures. The Executor would catch FetchFailedException and send a special msg back to the driver. However, intervening user code could intercept that exception, and wrap it with something else. This even happens in SparkSQL. So rather than checking the thrown exception only, we'll store the fetch failure directly in the TaskContext, where users can't touch it. Added a test case which failed before the fix. Full test suite via jenkins. Author: Imran Rashid <[email protected]> Closes apache#16639 from squito/SPARK-19276. Conflicts: core/src/test/scala/org/apache/spark/executor/ExecutorSuite.scala Conflicts: project/MimaExcludes.scala

squito force-pushed the SPARK-19276 branch from 0091aba to 5c28b62 Compare January 19, 2017 06:35

squito force-pushed the SPARK-19276 branch from 5c28b62 to b93c37f Compare January 19, 2017 06:43

cleanup

9635980

mridulm reviewed Jan 19, 2017

View reviewed changes

squito added 2 commits January 19, 2017 11:13

fix mima

0a60aef

review feedback

bbef893

Merge branch 'master' into SPARK-19276

730fd83

fix use of LocalSparkContext

4494673

vanzin approved these changes Jan 20, 2017

View reviewed changes

kayousterhout reviewed Feb 7, 2017

View reviewed changes

kayousterhout reviewed Feb 9, 2017

View reviewed changes

squito added 3 commits February 14, 2017 10:32

review feedback

14f5125

move task deserialization case into its own issue

08491c5

Merge branch 'master' into SPARK-19276

7840480

squito commented Feb 14, 2017

View reviewed changes

kayousterhout reviewed Feb 24, 2017

View reviewed changes

squito added 5 commits February 27, 2017 16:48

Merge branch 'master' into SPARK-19276

22da707

review feedback

2a49705

review feedback

84eae14

unit test for OOM with a fetchfailure

bee5621

reword comment

ad47611

kayousterhout approved these changes Feb 28, 2017

View reviewed changes

update comments

965506a

asfgit closed this in 8417a7a Mar 3, 2017


		val serTask = serializer.serialize(task)
		val taskDescription = fakeTaskDescription(serTask)

		val taskDescription = fakeTaskDescription(serTask)


		val failReason = runTaskAndGetFailReason(taskDescription)

[SPARK-19276][CORE] Fetch Failure handling robust to user error handling #16639

[SPARK-19276][CORE] Fetch Failure handling robust to user error handling #16639

Uh oh!

Conversation

squito commented Jan 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Jan 19, 2017

Uh oh!

squito commented Jan 19, 2017

Uh oh!

SparkQA commented Jan 19, 2017

Uh oh!

SparkQA commented Jan 19, 2017

Uh oh!

SparkQA commented Jan 19, 2017

Uh oh!

mridulm left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

squito commented Jan 19, 2017

Uh oh!

SparkQA commented Jan 19, 2017

Uh oh!

SparkQA commented Jan 19, 2017

Uh oh!

SparkQA commented Jan 19, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kayousterhout left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kayousterhout Feb 7, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

squito commented Feb 7, 2017

Uh oh!

mridulm commented Feb 7, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kayousterhout commented Feb 9, 2017

Uh oh!

kayousterhout left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

squito commented Jan 19, 2017 •

edited

Loading

kayousterhout Feb 7, 2017 •

edited

Loading

mridulm commented Feb 7, 2017 •

edited

Loading