-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-25636][CORE] spark-submit cuts off the failure reason when there is an error connecting to master #22623
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
is an error connecting to master
|
The only thing I'm confused about is, right now when the |
|
Please update the title as it seems cut off. |
|
Thanks @srowen for looking into this.
Here the cause is getting wrapped as SparkException with the message. And in SparkSubmit.scala, it is just printing this message and discarding the caused exception.
The other option is to print the whole stack trace instead of just message here. Please let me know your thought, I can make change with this. |
|
Yeah, that's the issue. Maybe change Otherwise this change doesn't address the fact that you'd still only print a little info about one cause, not all of them. |
|
Looking at the 2.3 code, it might be the right thing to just remove the handler for |
|
ok to test |
|
Test build #97003 has finished for PR 22623 at commit
|
|
Test build #97021 has finished for PR 22623 at commit
|
|
Test build #4358 has finished for PR 22623 at commit
|
| case e: Exception => if (!exitedCleanly) throw e | ||
| case e: Exception => | ||
| message = e.getMessage | ||
| if (!(exitedCleanly || message.contains(searchString))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it didn't exit cleanly, can it be possible that the exception is the correct expected one, and that its message contains the search string? I'm probably missing the reason why this has to be checked here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this PR change, SparkException will not be caught and thrown directly, and nothing writing to System.err and also no exit(exitedCleanly = false) in this case, here we need to check the thrown exception message whether it has the expected searchString or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually wonder if throwing this exception below is actually doing anything? thread.join() will not throw it in the test thread, which is what would actually matter.
So it seems that to achieve what the comment above says, you should do just this instead:
if (!exitedCleanly) {
message = e.getMessage
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually wonder if throwing this exception below is actually doing anything?
It would give the stack trace in the test log if there is any unexpected exception. With the below, we may give exception message but no stack trace in the log during fail.
if (!exitedCleanly) {
message = e.getMessage
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then just print the exception, or fail the test explicitly with that exception somehow. At least that's explicitly doing what is expected of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(e.g., by stashing the exception in a variable and doing these checks on the test thread after the thread.join() call.)
|
That's checked below. Why this change? |
|
This change is to avoid the expected exception is being thrown from the thread and getting printed in the test log. |
|
Just to check my understanding, |
|
Yes, you are right, |
|
Test build #97127 has finished for PR 22623 at commit
|
| if (!joined.contains(searchString)) { | ||
| fail(s"Search string '$searchString' not found in $joined") | ||
| val searchStrContainsInEx = exception != null && exception.getMessage.contains(searchString) | ||
| if(!searchStrContainsInEx){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need spaces around parens here. But I think there's a logic issue; if exception is null but exitedCleanly is false, you try to throw null. I get that it won't be null if so, but that highlights that I think they're redundant. And I don't think it's helping to pull out the condition here in a variable.
I think this is just simpler to delete exitedCleanly and ...
@volatile var: Exception = null
...
if (exception != null) {
if (!exception.getMessage.contains(searchString)) {
throw exception
}
} else if (!joined.contains(searchString) {
fail(...)
}
I think that's equivalent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've update as per this, thanks for the suggestion.
| if (!joined.contains(searchString)) { | ||
| fail(s"Search string '$searchString' not found in $joined") | ||
| val searchStrContainsInEx = exception != null && exception.getMessage.contains(searchString) | ||
| if (!searchStrContainsInEx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thinks this is a bit hard to follow. Isn't this equivalent to:
if (exitedCleanly) {
// check the output captured in printStream
} else {
assert(exception != null)
if (message does not contain search string) {
throw exception
}
}
Also, now that you moved the logic here, the comment in L83 is kinda orphaned, and a fresher comment explaining this block would be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated as per Sean's comment, plz check whether it is ok.
| if (!exception.getMessage.contains(searchString)) { | ||
| throw exception | ||
| } | ||
| } else if (!joined.contains(searchString)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be just assert(joined.contains(blah)). You could also move the assignment of joined inside the else here.
Otherwise looks good pending tests.
|
Test build #97132 has finished for PR 22623 at commit
|
|
Test build #97133 has finished for PR 22623 at commit
|
|
Test build #97134 has finished for PR 22623 at commit
|
and exception is being thrown from the thread, making the change to support this.
|
Test build #97143 has finished for PR 22623 at commit
|
|
Test build #4362 has finished for PR 22623 at commit
|
|
Test build #4365 has started for PR 22623 at commit |
vanzin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good pending tests.
| fail(s"Search string '$searchString' not found in $joined") | ||
| if (exitedCleanly) { | ||
| val joined = printStream.lineBuffer.mkString("\n") | ||
| assert(joined.contains(searchString), s"Search string '$searchString' not found in $joined") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The message is redundant, assert already gives you a nice message, e.g.
[info] "foo" did not contain "bar" (blah.scala)
|
Test build #97179 has finished for PR 22623 at commit
|
|
Merging to master / 2.4. |
…re is an error connecting to master
## What changes were proposed in this pull request?
Cause of the error is wrapped with SparkException, now finding the cause from the wrapped exception and throwing the cause instead of the wrapped exception.
## How was this patch tested?
Verified it manually by checking the cause of the error, it gives the error as shown below.
### Without the PR change
```
[apache-spark]$ ./bin/spark-submit --verbose --master spark://******
....
Error: Exception thrown in awaitResult:
Run with --help for usage help or --verbose for debug output
```
### With the PR change
```
[apache-spark]$ ./bin/spark-submit --verbose --master spark://******
....
Exception in thread "main" org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
....
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.IOException: Failed to connect to devaraj-pc1/10.3.66.65:7077
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245)
....
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: devaraj-pc1/10.3.66.65:7077
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
....
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
... 1 more
Caused by: java.net.ConnectException: Connection refused
... 11 more
```
Closes #22623 from devaraj-kavali/SPARK-25636.
Authored-by: Devaraj K <[email protected]>
Signed-off-by: Marcelo Vanzin <[email protected]>
(cherry picked from commit 8a7872d)
Signed-off-by: Marcelo Vanzin <[email protected]>
…re is an error connecting to master
## What changes were proposed in this pull request?
Cause of the error is wrapped with SparkException, now finding the cause from the wrapped exception and throwing the cause instead of the wrapped exception.
## How was this patch tested?
Verified it manually by checking the cause of the error, it gives the error as shown below.
### Without the PR change
```
[apache-spark]$ ./bin/spark-submit --verbose --master spark://******
....
Error: Exception thrown in awaitResult:
Run with --help for usage help or --verbose for debug output
```
### With the PR change
```
[apache-spark]$ ./bin/spark-submit --verbose --master spark://******
....
Exception in thread "main" org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
....
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.IOException: Failed to connect to devaraj-pc1/10.3.66.65:7077
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245)
....
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: devaraj-pc1/10.3.66.65:7077
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
....
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
... 1 more
Caused by: java.net.ConnectException: Connection refused
... 11 more
```
Closes apache#22623 from devaraj-kavali/SPARK-25636.
Authored-by: Devaraj K <[email protected]>
Signed-off-by: Marcelo Vanzin <[email protected]>
What changes were proposed in this pull request?
Cause of the error is wrapped with SparkException, now finding the cause from the wrapped exception and throwing the cause instead of the wrapped exception.
How was this patch tested?
Verified it manually by checking the cause of the error, it gives the error as shown below.
Without the PR change
With the PR change