-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-24558][Core]wrong Idle Timeout value is used in case of the cacheBlock. #21565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
nit: could you please replace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blockManagerMaster.hasCachedBlocks(removedExecutorId)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes blockManagerMaster is referring to same SparkEnv.get.blockManager.master. Will update the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the code as per comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// If it is a cached block, it uses cachedExecutorIdleTimeoutS for timeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, if it is cached block then IDLE time out taken from the spark.dynamicAllocation.cachedExecutorIdleTimeout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maropu can you review and merge the PR if the changes are fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you refine the wording?
30fcef6 to
a5708cc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about changing removeTimes to HashMap[String, (Long, Boolean)] (and the Boolean field indicates whether it is for cachedExecutor idle timeout or not) ? Thus, we do not need to ask blockManagerMaster again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blockManagerMaster already provides API to check it is cached block or not,so I feel it will be overhead to maintain another HashMap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not maintain another HashMap, but alter its original structure. In this way, we do not need to issue extra rpc calls to BlockManagerMaster here. As you mentioned 'API', this thing happens after a rpc call happened.
|
@cloud-fan can you please review this small piece of code and merge this PR |
|
ok to test |
|
Test build #92178 has finished for PR 21565 at commit
|
|
Test build #92179 has finished for PR 21565 at commit
|
|
@cloud-fan Build is passed.Can you push this PR |
|
have you addressed all the comments? |
|
yes all the review comments are addressed |
|
have you addressed #21565 (comment) ? |
It is corrected as per the configuration.
a5708cc to
a67107a
Compare
|
LGTM |
|
Test build #92932 has finished for PR 21565 at commit
|
|
thanks, merging to master! |
|
@cloud-fan did this merge? |
|
seems something was wrong with my merge script... It's merged now, @srowen thanks for reminding! |
It is corrected as per the configuration.
What changes were proposed in this pull request?
IdleTimeout info used to print in the logs is taken based on the cacheBlock. If it is cacheBlock then cachedExecutorIdleTimeoutS is considered else executorIdleTimeoutS
How was this patch tested?
Manual Test
spark-sql> cache table sample;
2018-05-15 14:44:02 INFO DAGScheduler:54 - Submitting 3 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[8] at processCmd at CliDriver.java:376) (first 15 tasks are for partitions Vector(0, 1, 2))
2018-05-15 14:44:02 INFO YarnScheduler:54 - Adding task set 0.0 with 3 tasks
2018-05-15 14:44:03 INFO ExecutorAllocationManager:54 - Requesting 1 new executor because tasks are backlogged (new desired total will be 1)
...
...
2018-05-15 14:46:10 INFO YarnClientSchedulerBackend:54 - Actual list of executor(s) to be killed is 1
2018-05-15 14:46:10 INFO ExecutorAllocationManager:54 - Removing executor 1 because it has been idle for 120 seconds (new desired total will be 0)
2018-05-15 14:46:11 INFO YarnSchedulerBackend$YarnDriverEndpoint:54 - Disabling executor 1.
2018-05-15 14:46:11 INFO DAGScheduler:54 - Executor lost: 1 (epoch 1)