Skip to content

Conversation

@sandeep-katta
Copy link
Contributor

It is corrected as per the configuration.

What changes were proposed in this pull request?

IdleTimeout info used to print in the logs is taken based on the cacheBlock. If it is cacheBlock then cachedExecutorIdleTimeoutS is considered else executorIdleTimeoutS

How was this patch tested?

Manual Test
spark-sql> cache table sample;
2018-05-15 14:44:02 INFO DAGScheduler:54 - Submitting 3 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[8] at processCmd at CliDriver.java:376) (first 15 tasks are for partitions Vector(0, 1, 2))
2018-05-15 14:44:02 INFO YarnScheduler:54 - Adding task set 0.0 with 3 tasks
2018-05-15 14:44:03 INFO ExecutorAllocationManager:54 - Requesting 1 new executor because tasks are backlogged (new desired total will be 1)
...
...
2018-05-15 14:46:10 INFO YarnClientSchedulerBackend:54 - Actual list of executor(s) to be killed is 1
2018-05-15 14:46:10 INFO ExecutorAllocationManager:54 - Removing executor 1 because it has been idle for 120 seconds (new desired total will be 0)
2018-05-15 14:46:11 INFO YarnSchedulerBackend$YarnDriverEndpoint:54 - Disabling executor 1.
2018-05-15 14:46:11 INFO DAGScheduler:54 - Executor lost: 1 (epoch 1)

@sandeep-katta sandeep-katta changed the title wrong Idle Timeout value is used in case of the cacheBlock. [SPARK-24558][SPARK-Core]wrong Idle Timeout value is used in case of the cacheBlock. Jun 14, 2018
@kiszk
Copy link
Member

kiszk commented Jun 15, 2018

nit: could you please replace [Spark-Core] with [Core] in the title?

@sandeep-katta sandeep-katta changed the title [SPARK-24558][SPARK-Core]wrong Idle Timeout value is used in case of the cacheBlock. [SPARK-24558][Core]wrong Idle Timeout value is used in case of the cacheBlock. Jun 16, 2018
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blockManagerMaster.hasCachedBlocks(removedExecutorId)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes blockManagerMaster is referring to same SparkEnv.get.blockManager.master. Will update the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the code as per comments

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// If it is a cached block, it uses cachedExecutorIdleTimeoutS for timeout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, if it is cached block then IDLE time out taken from the spark.dynamicAllocation.cachedExecutorIdleTimeout

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maropu can you review and merge the PR if the changes are fine

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you refine the wording?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about changing removeTimes to HashMap[String, (Long, Boolean)] (and the Boolean field indicates whether it is for cachedExecutor idle timeout or not) ? Thus, we do not need to ask blockManagerMaster again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blockManagerMaster already provides API to check it is cached block or not,so I feel it will be overhead to maintain another HashMap

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not maintain another HashMap, but alter its original structure. In this way, we do not need to issue extra rpc calls to BlockManagerMaster here. As you mentioned 'API', this thing happens after a rpc call happened.

@sandeep-katta
Copy link
Contributor Author

@cloud-fan can you please review this small piece of code and merge this PR

@cloud-fan
Copy link
Contributor

ok to test

@SparkQA
Copy link

SparkQA commented Jun 21, 2018

Test build #92178 has finished for PR 21565 at commit a5708cc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 21, 2018

Test build #92179 has finished for PR 21565 at commit a5708cc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sandeep-katta
Copy link
Contributor Author

@cloud-fan Build is passed.Can you push this PR

@cloud-fan
Copy link
Contributor

have you addressed all the comments?

@sandeep-katta
Copy link
Contributor Author

yes all the review comments are addressed

@cloud-fan
Copy link
Contributor

have you addressed #21565 (comment) ?

It is corrected as per the configuration.
@cloud-fan
Copy link
Contributor

LGTM

@SparkQA
Copy link

SparkQA commented Jul 12, 2018

Test build #92932 has finished for PR 21565 at commit a67107a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@srowen
Copy link
Member

srowen commented Jul 15, 2018

@cloud-fan did this merge?

@cloud-fan
Copy link
Contributor

seems something was wrong with my merge script... It's merged now, @srowen thanks for reminding!

@asfgit asfgit closed this in 2603ae3 Jul 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants