Skip to content

Conversation

@devaraj-kavali
Copy link

@devaraj-kavali devaraj-kavali commented Oct 3, 2018

What changes were proposed in this pull request?

Cause of the error is wrapped with SparkException, now finding the cause from the wrapped exception and throwing the cause instead of the wrapped exception.

How was this patch tested?

Verified it manually by checking the cause of the error, it gives the error as shown below.

Without the PR change

[apache-spark]$ ./bin/spark-submit --verbose --master spark://******
....
Error: Exception thrown in awaitResult:
Run with --help for usage help or --verbose for debug output

With the PR change

[apache-spark]$ ./bin/spark-submit --verbose --master spark://******
....
Exception in thread "main" org.apache.spark.SparkException: Exception thrown in awaitResult:
        at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
        at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
        ....
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.IOException: Failed to connect to devaraj-pc1/10.3.66.65:7077
        at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245)
       ....
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: devaraj-pc1/10.3.66.65:7077
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        ....
        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
        ... 1 more
Caused by: java.net.ConnectException: Connection refused
        ... 11 more

@srowen
Copy link
Member

srowen commented Oct 4, 2018

The only thing I'm confused about is, right now when the SparkException is thrown instead of its cause, wouldn't you get a big stack trace starting with the SparkException? that SparkException is useful too. I don't see why it doesn't print in this case, do you? and can we fix that part.

@tgravescs
Copy link
Contributor

Please update the title as it seems cut off.

@devaraj-kavali devaraj-kavali changed the title [SPARK-25636][CORE] spark-submit swallows the failure reason when there [SPARK-25636][CORE] spark-submit cuts off the failure reason when there is an error connecting to master Oct 4, 2018
@devaraj-kavali
Copy link
Author

Thanks @srowen for looking into this.

ThreadUtils.scala

      case NonFatal(t) if !t.isInstanceOf[TimeoutException] =>
        throw new SparkException("Exception thrown in awaitResult: ", t)

Here the cause is getting wrapped as SparkException with the message. And in SparkSubmit.scala, it is just printing this message and discarding the caused exception.

SparkSubmit.scala

      override def doSubmit(args: Array[String]): Unit = {
        try {
          super.doSubmit(args)
        } catch {
          case e: SparkUserAppException =>
            exitFn(e.exitCode)
          case e: SparkException =>
            printErrorAndExit(e.getMessage())
        }

The other option is to print the whole stack trace instead of just message here. Please let me know your thought, I can make change with this.

@srowen
Copy link
Member

srowen commented Oct 5, 2018

Yeah, that's the issue. Maybe change printErrorAndExit to logError the exception in this case. It's noisier to be sure but I think losing the stack trace here is significant. @vanzin I think you touched this part last; WDYT?

Otherwise this change doesn't address the fact that you'd still only print a little info about one cause, not all of them.

@vanzin
Copy link
Contributor

vanzin commented Oct 5, 2018

Looking at the 2.3 code, it might be the right thing to just remove the handler for SparkException in SparkSubmit.main, and let it propagate.

    try {
      mainMethod.invoke(null, childArgs.toArray)
    } catch {
      case t: Throwable =>
        findCause(t) match {
          case SparkUserAppException(exitCode) =>
            System.exit(exitCode)

          case t: Throwable =>
            throw t
        }
    }

@vanzin
Copy link
Contributor

vanzin commented Oct 5, 2018

ok to test

@SparkQA
Copy link

SparkQA commented Oct 5, 2018

Test build #97003 has finished for PR 22623 at commit 81e1730.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 6, 2018

Test build #97021 has finished for PR 22623 at commit a82e75f.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 6, 2018

Test build #4358 has finished for PR 22623 at commit a82e75f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

case e: Exception => if (!exitedCleanly) throw e
case e: Exception =>
message = e.getMessage
if (!(exitedCleanly || message.contains(searchString))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it didn't exit cleanly, can it be possible that the exception is the correct expected one, and that its message contains the search string? I'm probably missing the reason why this has to be checked here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this PR change, SparkException will not be caught and thrown directly, and nothing writing to System.err and also no exit(exitedCleanly = false) in this case, here we need to check the thrown exception message whether it has the expected searchString or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually wonder if throwing this exception below is actually doing anything? thread.join() will not throw it in the test thread, which is what would actually matter.

So it seems that to achieve what the comment above says, you should do just this instead:

if (!exitedCleanly) {
  message = e.getMessage
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually wonder if throwing this exception below is actually doing anything?

It would give the stack trace in the test log if there is any unexpected exception. With the below, we may give exception message but no stack trace in the log during fail.

if (!exitedCleanly) {
  message = e.getMessage
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then just print the exception, or fail the test explicitly with that exception somehow. At least that's explicitly doing what is expected of it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(e.g., by stashing the exception in a variable and doing these checks on the test thread after the thread.join() call.)

@srowen
Copy link
Member

srowen commented Oct 8, 2018

That's checked below. Why this change?

@devaraj-kavali
Copy link
Author

This change is to avoid the expected exception is being thrown from the thread and getting printed in the test log.

@srowen
Copy link
Member

srowen commented Oct 8, 2018

Just to check my understanding, exitedCleanly is false even when the expected exception is thrown? OK that makes sense.

@devaraj-kavali
Copy link
Author

Yes, you are right, exitedCleanly is false even when the expected exception is thrown.

@SparkQA
Copy link

SparkQA commented Oct 8, 2018

Test build #97127 has finished for PR 22623 at commit 20c4e01.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

if (!joined.contains(searchString)) {
fail(s"Search string '$searchString' not found in $joined")
val searchStrContainsInEx = exception != null && exception.getMessage.contains(searchString)
if(!searchStrContainsInEx){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need spaces around parens here. But I think there's a logic issue; if exception is null but exitedCleanly is false, you try to throw null. I get that it won't be null if so, but that highlights that I think they're redundant. And I don't think it's helping to pull out the condition here in a variable.

I think this is just simpler to delete exitedCleanly and ...

@volatile var: Exception = null
...
if (exception != null) {
  if (!exception.getMessage.contains(searchString)) {
    throw exception
  }
} else if (!joined.contains(searchString) {
  fail(...)
}

I think that's equivalent?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've update as per this, thanks for the suggestion.

if (!joined.contains(searchString)) {
fail(s"Search string '$searchString' not found in $joined")
val searchStrContainsInEx = exception != null && exception.getMessage.contains(searchString)
if (!searchStrContainsInEx) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thinks this is a bit hard to follow. Isn't this equivalent to:

if (exitedCleanly) {
  // check the output captured in printStream
} else {
  assert(exception != null)
  if (message does not contain search string) {
    throw exception
  }
}

Also, now that you moved the logic here, the comment in L83 is kinda orphaned, and a fresher comment explaining this block would be better.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated as per Sean's comment, plz check whether it is ok.

if (!exception.getMessage.contains(searchString)) {
throw exception
}
} else if (!joined.contains(searchString)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be just assert(joined.contains(blah)). You could also move the assignment of joined inside the else here.

Otherwise looks good pending tests.

@SparkQA
Copy link

SparkQA commented Oct 8, 2018

Test build #97132 has finished for PR 22623 at commit 96104bd.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 9, 2018

Test build #97133 has finished for PR 22623 at commit 9e60602.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 9, 2018

Test build #97134 has finished for PR 22623 at commit 71e3e30.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

and exception is being thrown from the thread, making the change to
support this.
@SparkQA
Copy link

SparkQA commented Oct 9, 2018

Test build #97143 has finished for PR 22623 at commit 131b3af.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 9, 2018

Test build #4362 has finished for PR 22623 at commit 131b3af.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 9, 2018

Test build #4365 has started for PR 22623 at commit 131b3af.

Copy link
Contributor

@vanzin vanzin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good pending tests.

fail(s"Search string '$searchString' not found in $joined")
if (exitedCleanly) {
val joined = printStream.lineBuffer.mkString("\n")
assert(joined.contains(searchString), s"Search string '$searchString' not found in $joined")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message is redundant, assert already gives you a nice message, e.g.

[info]   "foo" did not contain "bar" (blah.scala)

@SparkQA
Copy link

SparkQA commented Oct 10, 2018

Test build #97179 has finished for PR 22623 at commit 1002e44.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Oct 10, 2018

Merging to master / 2.4.

@asfgit asfgit closed this in 8a7872d Oct 10, 2018
asfgit pushed a commit that referenced this pull request Oct 10, 2018
…re is an error connecting to master

## What changes were proposed in this pull request?
Cause of the error is wrapped with SparkException, now finding the cause from the wrapped exception and throwing the cause instead of the wrapped exception.

## How was this patch tested?
Verified it manually by checking the cause of the error, it gives the error as shown below.

### Without the PR change

```
[apache-spark]$ ./bin/spark-submit --verbose --master spark://******
....
Error: Exception thrown in awaitResult:
Run with --help for usage help or --verbose for debug output

```

### With the PR change

```
[apache-spark]$ ./bin/spark-submit --verbose --master spark://******
....
Exception in thread "main" org.apache.spark.SparkException: Exception thrown in awaitResult:
        at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
        at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
        ....
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.IOException: Failed to connect to devaraj-pc1/10.3.66.65:7077
        at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245)
       ....
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: devaraj-pc1/10.3.66.65:7077
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        ....
        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
        ... 1 more
Caused by: java.net.ConnectException: Connection refused
        ... 11 more

```

Closes #22623 from devaraj-kavali/SPARK-25636.

Authored-by: Devaraj K <[email protected]>
Signed-off-by: Marcelo Vanzin <[email protected]>
(cherry picked from commit 8a7872d)
Signed-off-by: Marcelo Vanzin <[email protected]>
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…re is an error connecting to master

## What changes were proposed in this pull request?
Cause of the error is wrapped with SparkException, now finding the cause from the wrapped exception and throwing the cause instead of the wrapped exception.

## How was this patch tested?
Verified it manually by checking the cause of the error, it gives the error as shown below.

### Without the PR change

```
[apache-spark]$ ./bin/spark-submit --verbose --master spark://******
....
Error: Exception thrown in awaitResult:
Run with --help for usage help or --verbose for debug output

```

### With the PR change

```
[apache-spark]$ ./bin/spark-submit --verbose --master spark://******
....
Exception in thread "main" org.apache.spark.SparkException: Exception thrown in awaitResult:
        at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
        at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
        ....
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.IOException: Failed to connect to devaraj-pc1/10.3.66.65:7077
        at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245)
       ....
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: devaraj-pc1/10.3.66.65:7077
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        ....
        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
        ... 1 more
Caused by: java.net.ConnectException: Connection refused
        ... 11 more

```

Closes apache#22623 from devaraj-kavali/SPARK-25636.

Authored-by: Devaraj K <[email protected]>
Signed-off-by: Marcelo Vanzin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants