-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-28839][CORE] Avoids NPE in context cleaner when dynamic allocation and shuffle service are on #25551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@vanzin do you mind if I ask to take a look and see if it makes sense? |
|
|
||
| def removeShuffle(id: Int): Unit = { | ||
| if (shuffleIds.remove(id) && shuffleIds.isEmpty) { | ||
| if (shuffleIds != null && shuffleIds.remove(id) && shuffleIds.isEmpty) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, is this only for 3.0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, I think so.
|
Test build #109568 has finished for PR 25551 at commit
|
|
retest this please |
|
Test build #109570 has finished for PR 25551 at commit
|
|
retest this please |
|
Test build #109577 has finished for PR 25551 at commit
|
viirya
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch!
core/src/test/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitorSuite.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala
Outdated
Show resolved
Hide resolved
|
Test build #109599 has finished for PR 25551 at commit
|
core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala
Outdated
Show resolved
Hide resolved
|
Test build #109637 has finished for PR 25551 at commit
|
|
Test build #4839 has finished for PR 25551 at commit
|
|
retest this please |
|
Test build #109646 has finished for PR 25551 at commit
|
vanzin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merging to master.
| val bus = mockListenerBus() | ||
| conf.set(DYN_ALLOCATION_SHUFFLE_TRACKING, true).set(SHUFFLE_SERVICE_ENABLED, true) | ||
| monitor = new ExecutorMonitor(conf, client, bus, clock) { | ||
| override def onOtherEvent(event: SparkListenerEvent): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could have used mockito's verify instead of this, but this is ok.
|
Thanks all! |
What changes were proposed in this pull request?
This PR proposes to avoid to thrown NPE at context cleaner when shuffle service is on - it is kind of a small followup of #24817
Seems like it sets
nullforshuffleIdsto track when the service is on. Later,removeShuffletries to remove an element atshuffleIdswhich leads to NPE. It fixes it by explicitly not sending the event (ShuffleCleanedEvent) in this case.See the code path below:
spark/core/src/main/scala/org/apache/spark/SparkContext.scala
Line 584 in cbad616
spark/core/src/main/scala/org/apache/spark/ContextCleaner.scala
Line 125 in cbad616
spark/core/src/main/scala/org/apache/spark/ContextCleaner.scala
Line 190 in cbad616
spark/core/src/main/scala/org/apache/spark/ContextCleaner.scala
Lines 220 to 230 in cbad616
spark/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala
Lines 353 to 357 in cbad616
spark/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala
Line 347 in cbad616
spark/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala
Lines 400 to 406 in cbad616
spark/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala
Line 475 in cbad616
spark/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala
Line 427 in cbad616
Why are the changes needed?
This is a bug fix.
Does this PR introduce any user-facing change?
It prevents the exception:
How was this patch test?
Unittest was added.