[SPARK-24889][Core] Update block info when unpersist rdds #22341

viirya · 2018-09-05T10:10:55Z

What changes were proposed in this pull request?

We will update block info coming from executors, at the timing like caching a RDD. However, when removing RDDs with unpersisting, we don't ask to update block info. So the block info is not updated.

We can fix this with few options:

Ask to update block info when unpersisting

This is simplest but changes driver-executor communication a bit.

Update block info when processing the event of unpersisting RDD

We send a SparkListenerUnpersistRDD event when unpersisting RDD. When processing this event, we can update block info of the RDD. This only changes event processing code so the risk seems to be lower.

Currently this patch takes option 2 for lower risk. If we agree first option has no risk, we can change to it.

How was this patch tested?

Unit tests.

SparkQA · 2018-09-05T10:31:10Z

Test build #95708 has finished for PR 22341 at commit dd5f766.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2018-09-05T10:35:49Z

retest this please.

xuanyuanking · 2018-09-05T14:09:17Z

core/src/main/scala/org/apache/spark/storage/RDDInfo.scala


 private[spark] object RDDInfo {
-  private val callsiteForm = SparkEnv.get.conf.get(EVENT_LOG_CALLSITE_FORM)
+  private lazy val callsiteForm = SparkEnv.get.conf.get(EVENT_LOG_CALLSITE_FORM)


Is this related to the problem?

I ran the test locally and this causes error when initializing RDDInfo. Actually I think this should be lazy because it is not always needed.

Thanks for explaining.

SparkQA · 2018-09-05T15:13:32Z

Test build #95709 has finished for PR 22341 at commit dd5f766.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cfangplus · 2018-09-08T10:23:48Z

core/src/main/scala/org/apache/spark/status/AppStatusListener.scala

+      val partitions = liveRDD.getPartitions()
+
+      partitions.foreach { case (_, part) =>
+        val executors = part.executors


In fact, each parition here contains only one executor, right? Seq Executors here is only single element sequence.

No, a partition can exist on more than one executor.

vanzin

Just some minor things.

vanzin · 2018-09-10T17:20:38Z

core/src/main/scala/org/apache/spark/status/AppStatusListener.scala


 import scala.collection.JavaConverters._
 import scala.collection.mutable.HashMap
+import scala.collection.mutable.HashSet


nit: merge with above import

vanzin · 2018-09-10T17:22:59Z

core/src/main/scala/org/apache/spark/status/AppStatusListener.scala

  override def onUnpersistRDD(event: SparkListenerUnpersistRDD): Unit = {
-    liveRDDs.remove(event.rddId)
+    liveRDDs.remove(event.rddId).foreach { liveRDD =>
+      val executorsToUpdate = new HashSet[LiveExecutor]()


I'd just call maybeUpdate directly when updating the executor. It's cheaper than inserting into a set, since a duplicate call will just compare the timestamp of the last update and then do nothing.

An executor can be updated first when processing distributions and later when processing partitions. We want a duplicate call to do update in fact. If we just call maybeUpdate directly, the duplicate call will do nothing and miss updates from processing partitions.

Right. But it would be nice to avoid the hash set if possible. The less stuff listeners have to do, the better.

Ok. Good point for me. I updated the change to remove the hash set now.

vanzin · 2018-09-10T17:24:56Z

core/src/main/scala/org/apache/spark/status/AppStatusListener.scala

+        maybeExec.foreach { exec =>
+          if (exec.hasMemoryInfo) {
+            if (storageLevel.useOffHeap) {
+              exec.usedOffHeap = math.max(0, exec.usedOffHeap - rddDist.offHeapUsed)


I'd take the newValue method inside updateRDDBlock and make it a proper private method (with a better name), since then it becomes clearer why this logic is needed.

vanzin · 2018-09-10T17:27:01Z

core/src/main/scala/org/apache/spark/status/AppStatusListener.scala

+
+      // Use RDD distribution to update executor memory and disk usage info.
+      distributions.foreach { case (executorId, rddDist) =>
+        val maybeExec = liveExecutors.get(executorId)


You don't need the variable, right? liveExecutors.get(executorId).foreach is enough?

(Same with distributions and partitions and executors.)

Yeah, right.

vanzin · 2018-09-10T17:28:36Z

core/src/main/scala/org/apache/spark/status/LiveEntity.scala

  }

+  def getPartitions(): Map[String, LiveRDDPartition] = {
+    partitions.toMap


This makes a copy, no? Is there a need to make that copy?

I think this just creates an immutable map and inserts all elements into it and so no copy of elements.

But it makes a copy of the map, and that seems unnecessary. Of course it does not make a copy of the elements - it doesn't even know how to do that.

Do we prefer just to simply return the private mutable HashMap?

Sure, it's an internal API. Listener code needs to avoid doing unnecessary things like copying stuff to avoid issues with dropping events.

viirya · 2018-09-11T07:48:03Z

@vanzin Thanks for the good review! I've updated this to address them all.

SparkQA · 2018-09-11T11:23:26Z

Test build #95926 has finished for PR 22341 at commit 7c76790.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2018-09-11T11:30:14Z

retest this please.

SparkQA · 2018-09-11T16:23:49Z

Test build #95937 has finished for PR 22341 at commit 7c76790.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

vanzin · 2018-09-11T16:50:39Z

core/src/main/scala/org/apache/spark/status/LiveEntity.scala

    distributions.get(exec.executorId)
  }

+  def getPartitions(): scala.collection.Map[String, LiveRDDPartition] = partitions


I'd have just exposed the fields but this is fine too.

vanzin · 2018-09-11T17:30:44Z

Merging to master, 2.4 and 2.3.

## What changes were proposed in this pull request? We will update block info coming from executors, at the timing like caching a RDD. However, when removing RDDs with unpersisting, we don't ask to update block info. So the block info is not updated. We can fix this with few options: 1. Ask to update block info when unpersisting This is simplest but changes driver-executor communication a bit. 2. Update block info when processing the event of unpersisting RDD We send a `SparkListenerUnpersistRDD` event when unpersisting RDD. When processing this event, we can update block info of the RDD. This only changes event processing code so the risk seems to be lower. Currently this patch takes option 2 for lower risk. If we agree first option has no risk, we can change to it. ## How was this patch tested? Unit tests. Closes #22341 from viirya/SPARK-24889. Authored-by: Liang-Chi Hsieh <[email protected]> Signed-off-by: Marcelo Vanzin <[email protected]> (cherry picked from commit 14f3ad2) Signed-off-by: Marcelo Vanzin <[email protected]>

We will update block info coming from executors, at the timing like caching a RDD. However, when removing RDDs with unpersisting, we don't ask to update block info. So the block info is not updated. We can fix this with few options: 1. Ask to update block info when unpersisting This is simplest but changes driver-executor communication a bit. 2. Update block info when processing the event of unpersisting RDD We send a `SparkListenerUnpersistRDD` event when unpersisting RDD. When processing this event, we can update block info of the RDD. This only changes event processing code so the risk seems to be lower. Currently this patch takes option 2 for lower risk. If we agree first option has no risk, we can change to it. Unit tests. Closes #22341 from viirya/SPARK-24889. Authored-by: Liang-Chi Hsieh <[email protected]> Signed-off-by: Marcelo Vanzin <[email protected]> (cherry picked from commit 14f3ad2) Signed-off-by: Marcelo Vanzin <[email protected]>

vanzin · 2018-09-11T17:46:43Z

There was a trivial conflict in 2.3, I fixed it manually.

viirya · 2018-09-11T23:22:36Z

Thanks @vanzin

## What changes were proposed in this pull request? We will update block info coming from executors, at the timing like caching a RDD. However, when removing RDDs with unpersisting, we don't ask to update block info. So the block info is not updated. We can fix this with few options: 1. Ask to update block info when unpersisting This is simplest but changes driver-executor communication a bit. 2. Update block info when processing the event of unpersisting RDD We send a `SparkListenerUnpersistRDD` event when unpersisting RDD. When processing this event, we can update block info of the RDD. This only changes event processing code so the risk seems to be lower. Currently this patch takes option 2 for lower risk. If we agree first option has no risk, we can change to it. ## How was this patch tested? Unit tests. Closes apache#22341 from viirya/SPARK-24889. Authored-by: Liang-Chi Hsieh <[email protected]> Signed-off-by: Marcelo Vanzin <[email protected]>

Update memory and disk info when unpersist rdds.

dd5f766

xuanyuanking reviewed Sep 5, 2018

View reviewed changes

kiszk mentioned this pull request Sep 8, 2018

[SPARK-25091][Core] reduce the storage memory in Executor Tab when unpersist rdd #22335

Closed

cfangplus reviewed Sep 8, 2018

View reviewed changes

vanzin reviewed Sep 10, 2018

View reviewed changes

Address comments.

7c76790

vanzin reviewed Sep 11, 2018

View reviewed changes

asfgit closed this in 14f3ad2 Sep 11, 2018

viirya deleted the SPARK-24889 branch December 27, 2023 18:35

[SPARK-24889][Core] Update block info when unpersist rdds #22341

[SPARK-24889][Core] Update block info when unpersist rdds #22341

Uh oh!

Conversation

viirya commented Sep 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Sep 5, 2018

Uh oh!

viirya commented Sep 5, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 5, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vanzin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

viirya Sep 11, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

viirya commented Sep 11, 2018

Uh oh!

SparkQA commented Sep 11, 2018

Uh oh!

viirya commented Sep 11, 2018

Uh oh!

SparkQA commented Sep 11, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vanzin commented Sep 11, 2018

Uh oh!

vanzin commented Sep 11, 2018

Uh oh!

viirya commented Sep 11, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

viirya commented Sep 5, 2018 •

edited

Loading

viirya Sep 11, 2018 •

edited

Loading