Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented Sep 5, 2018

What changes were proposed in this pull request?

We will update block info coming from executors, at the timing like caching a RDD. However, when removing RDDs with unpersisting, we don't ask to update block info. So the block info is not updated.

We can fix this with few options:

  1. Ask to update block info when unpersisting

This is simplest but changes driver-executor communication a bit.

  1. Update block info when processing the event of unpersisting RDD

We send a SparkListenerUnpersistRDD event when unpersisting RDD. When processing this event, we can update block info of the RDD. This only changes event processing code so the risk seems to be lower.

Currently this patch takes option 2 for lower risk. If we agree first option has no risk, we can change to it.

How was this patch tested?

Unit tests.

@SparkQA
Copy link

SparkQA commented Sep 5, 2018

Test build #95708 has finished for PR 22341 at commit dd5f766.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Sep 5, 2018

retest this please.


private[spark] object RDDInfo {
private val callsiteForm = SparkEnv.get.conf.get(EVENT_LOG_CALLSITE_FORM)
private lazy val callsiteForm = SparkEnv.get.conf.get(EVENT_LOG_CALLSITE_FORM)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this related to the problem?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the test locally and this causes error when initializing RDDInfo. Actually I think this should be lazy because it is not always needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining.

@SparkQA
Copy link

SparkQA commented Sep 5, 2018

Test build #95709 has finished for PR 22341 at commit dd5f766.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val partitions = liveRDD.getPartitions()

partitions.foreach { case (_, part) =>
val executors = part.executors

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, each parition here contains only one executor, right? Seq Executors here is only single element sequence.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, a partition can exist on more than one executor.

Copy link
Contributor

@vanzin vanzin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor things.


import scala.collection.JavaConverters._
import scala.collection.mutable.HashMap
import scala.collection.mutable.HashSet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: merge with above import

override def onUnpersistRDD(event: SparkListenerUnpersistRDD): Unit = {
liveRDDs.remove(event.rddId)
liveRDDs.remove(event.rddId).foreach { liveRDD =>
val executorsToUpdate = new HashSet[LiveExecutor]()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just call maybeUpdate directly when updating the executor. It's cheaper than inserting into a set, since a duplicate call will just compare the timestamp of the last update and then do nothing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An executor can be updated first when processing distributions and later when processing partitions. We want a duplicate call to do update in fact. If we just call maybeUpdate directly, the duplicate call will do nothing and miss updates from processing partitions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. But it would be nice to avoid the hash set if possible. The less stuff listeners have to do, the better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Good point for me. I updated the change to remove the hash set now.

maybeExec.foreach { exec =>
if (exec.hasMemoryInfo) {
if (storageLevel.useOffHeap) {
exec.usedOffHeap = math.max(0, exec.usedOffHeap - rddDist.offHeapUsed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd take the newValue method inside updateRDDBlock and make it a proper private method (with a better name), since then it becomes clearer why this logic is needed.


// Use RDD distribution to update executor memory and disk usage info.
distributions.foreach { case (executorId, rddDist) =>
val maybeExec = liveExecutors.get(executorId)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need the variable, right? liveExecutors.get(executorId).foreach is enough?

(Same with distributions and partitions and executors.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, right.

}

def getPartitions(): Map[String, LiveRDDPartition] = {
partitions.toMap
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes a copy, no? Is there a need to make that copy?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this just creates an immutable map and inserts all elements into it and so no copy of elements.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it makes a copy of the map, and that seems unnecessary. Of course it does not make a copy of the elements - it doesn't even know how to do that.

Copy link
Member Author

@viirya viirya Sep 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we prefer just to simply return the private mutable HashMap?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it's an internal API. Listener code needs to avoid doing unnecessary things like copying stuff to avoid issues with dropping events.

@viirya
Copy link
Member Author

viirya commented Sep 11, 2018

@vanzin Thanks for the good review! I've updated this to address them all.

@SparkQA
Copy link

SparkQA commented Sep 11, 2018

Test build #95926 has finished for PR 22341 at commit 7c76790.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Sep 11, 2018

retest this please.

@SparkQA
Copy link

SparkQA commented Sep 11, 2018

Test build #95937 has finished for PR 22341 at commit 7c76790.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

distributions.get(exec.executorId)
}

def getPartitions(): scala.collection.Map[String, LiveRDDPartition] = partitions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have just exposed the fields but this is fine too.

@vanzin
Copy link
Contributor

vanzin commented Sep 11, 2018

Merging to master, 2.4 and 2.3.

asfgit pushed a commit that referenced this pull request Sep 11, 2018
## What changes were proposed in this pull request?

We will update block info coming from executors, at the timing like caching a RDD. However, when removing RDDs with unpersisting, we don't ask to update block info. So the block info is not updated.

We can fix this with few options:

1. Ask to update block info when unpersisting

This is simplest but changes driver-executor communication a bit.

2. Update block info when processing the event of unpersisting RDD

We send a `SparkListenerUnpersistRDD` event when unpersisting RDD. When processing this event, we can update block info of the RDD. This only changes event processing code so the risk seems to be lower.

Currently this patch takes option 2 for lower risk. If we agree first option has no risk, we can change to it.

## How was this patch tested?

Unit tests.

Closes #22341 from viirya/SPARK-24889.

Authored-by: Liang-Chi Hsieh <[email protected]>
Signed-off-by: Marcelo Vanzin <[email protected]>
(cherry picked from commit 14f3ad2)
Signed-off-by: Marcelo Vanzin <[email protected]>
@asfgit asfgit closed this in 14f3ad2 Sep 11, 2018
asfgit pushed a commit that referenced this pull request Sep 11, 2018
We will update block info coming from executors, at the timing like caching a RDD. However, when removing RDDs with unpersisting, we don't ask to update block info. So the block info is not updated.

We can fix this with few options:

1. Ask to update block info when unpersisting

This is simplest but changes driver-executor communication a bit.

2. Update block info when processing the event of unpersisting RDD

We send a `SparkListenerUnpersistRDD` event when unpersisting RDD. When processing this event, we can update block info of the RDD. This only changes event processing code so the risk seems to be lower.

Currently this patch takes option 2 for lower risk. If we agree first option has no risk, we can change to it.

Unit tests.

Closes #22341 from viirya/SPARK-24889.

Authored-by: Liang-Chi Hsieh <[email protected]>
Signed-off-by: Marcelo Vanzin <[email protected]>
(cherry picked from commit 14f3ad2)
Signed-off-by: Marcelo Vanzin <[email protected]>
@vanzin
Copy link
Contributor

vanzin commented Sep 11, 2018

There was a trivial conflict in 2.3, I fixed it manually.

@viirya
Copy link
Member Author

viirya commented Sep 11, 2018

Thanks @vanzin

fjh100456 pushed a commit to fjh100456/spark that referenced this pull request Sep 13, 2018
## What changes were proposed in this pull request?

We will update block info coming from executors, at the timing like caching a RDD. However, when removing RDDs with unpersisting, we don't ask to update block info. So the block info is not updated.

We can fix this with few options:

1. Ask to update block info when unpersisting

This is simplest but changes driver-executor communication a bit.

2. Update block info when processing the event of unpersisting RDD

We send a `SparkListenerUnpersistRDD` event when unpersisting RDD. When processing this event, we can update block info of the RDD. This only changes event processing code so the risk seems to be lower.

Currently this patch takes option 2 for lower risk. If we agree first option has no risk, we can change to it.

## How was this patch tested?

Unit tests.

Closes apache#22341 from viirya/SPARK-24889.

Authored-by: Liang-Chi Hsieh <[email protected]>
Signed-off-by: Marcelo Vanzin <[email protected]>
@viirya viirya deleted the SPARK-24889 branch December 27, 2023 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants