[SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager #31763

hiboyang · 2021-03-06T07:31:45Z

This PR is copied from #30004, with extra change addressing the comments. My git environment was messed up and could not update previous PR 30004. Thus create this new PR to replace the previous one.

What changes were proposed in this pull request?

Add generic metadata in MapStatus class to support custom shuffle manager. Also add a new method to retrieve all map output statuses and their metadata. See Jira: https://issues.apache.org/jira/projects/SPARK/issues/SPARK-33114

Why are the changes needed?

Current MapStatus class is tightly bound with local (sort merge) shuffle which uses BlockManagerId to store the shuffle data location. It could not support other custom shuffle manager implementation.

For example, when we implement Remote Shuffle Service, we want to put remote shuffle server information into MapStatus so reducer could fetch that information and figure out where to fetch data. The added MapStatus.metadata field could store such information.

If people implement other shuffle manager, they could also store their related information into this metadata field.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added unit test

SparkQA · 2021-03-06T10:16:38Z

Test build #135825 has finished for PR 31763 at commit 1309378.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2021-03-06T11:28:08Z

Could you close the previous PR and update the PR title correctly?

hiboyang · 2021-03-07T08:14:21Z

Could you close the previous PR and update the PR title correctly?

Yes, updated this PR’s title. Will close previous PR after people start to review here.

attilapiros · 2021-03-07T08:51:55Z

core/src/main/scala/org/apache/spark/MapOutputTracker.scala

+      case Some(shuffleStatus) =>
+        shuffleStatus.withMapStatuses { statuses =>
+          MapOutputTracker.checkMapStatuses(statuses, shuffleId)
+          statuses.clone


So as we discussed in #30004 (comment)

To change it to getAllMapOutputStatusMetadata and only return the metadata could be a solution extended with the restriction to allow only immutable metadata.

So to be on the safe side please document we require the metadata to be immutable and introduce an updateMetadata(meta: Option[Serializable]) method in MapStatus. Then we will be safe and all the use cases are covered.

Cool, I will change it getAllMapOutputStatusMetadata and update this PR.

attilapiros · 2021-03-07T08:53:37Z

core/src/test/scala/org/apache/spark/MapOutputTrackerSuite.scala

    assert(mapWorkerTracker.getMapSizesByExecutorId(10, 0).toSeq ===
      Seq((BlockManagerId("a", "hostA", 1000),
        ArrayBuffer((ShuffleBlockId(10, 5, 0), size1000, 0)))))
-    assert(0 == masterTracker.getNumCachedSerializedBroadcast)


Please do not remove this assert:

assert(0 == masterTracker.getNumCachedSerializedBroadcast)

it seems caused by merge issue, will add it back

Ngone51 · 2021-03-07T10:20:52Z

core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala

+   * to store information they need. For example, a Remote Shuffle Service ShuffleManager could
+   * store shuffle server information and let reducer task know where to fetch shuffle data.
+   */
+  def metadata: Option[Serializable]


hmm...what's the relationship between SPARK-33114 and SPARK-25299? According to the JIRA description, SPARK-33114 seems to enhance the support for custom shuffle manager while SPARK-25299 only customize the storage with the default SortShuffleManager.

So if we are only talking about SPARK-33114, adding metadata may be a good choice according to its own scenario. But if we bring in SPARK-25299 together (IIUC, what this PR is doing would also benefit SPARK-25299), I personally think we need a more general design here. For example, I'd prefer to redesign the location of MapStatus to make it be able to support different scenarios (e.g., Spark BlockManager, Spark external shuffle service, custom remote storage, etc. ) mentioned in SPARK-25299. And in this way, different scenarios would be able to reuse the existing features, e.g., decommission(which may update mapstatus location during runtime) and reuse the existing code paths, e.g., we don't need the extra getAllMapOutputStatuses and everything should be the same as what we already did during shuffle reading.

WDYT?

@Ngone51 I agree with you that finishing the design laid out in SPARK-25299 would be much better.
This is why I opened #30763 as a copy of Matthew Cheah's original PR for SPARK-31801 (because he is busy with other projects) and kept it up-to-date several times with the master.

But it haven't got enough reviews and I wouldn't want to block @hiboyang further, #30004 (comment).

I am sure with your help we can complete SPARK-31801 and be on the road of SPARK-25299.

So next week I will do the conflict resolution and ping you when the PR is ready for review. Is this okay?

Cool, thanks @attilapiros for keeping working on SPARK-25299 while unblocking this PR. @Ngone51 SPARK-33114 is a small change to support remote shuffle service/storage by adding a metadata object in MapStatus. It could be viewed as a subset of SPARK-25299 's work.

So next week I will do the conflict resolution and ping you when the PR is ready for review. Is this okay?

Sure, please. @attilapiros

@hiboyang Thanks for your explanation. I agree that #30763 is too big for review. But I think we can discuss there first to ensure we towards the same direction before we deep into details. And when we're on the same page, we can split the big PR into smaller pieces and start to co-work. Does it sound good to you?

SPARK-31801 is very big, and may take very long time to finish (already being there for 10 months). Could we merge this PR first?

If SPARK-31801 find a better way to support it and break getAllMapOutputStatusMetadata, it is actually a good thing :) We could have multiple iterations. This PR is the first iteration with very small change. SPARK-31801 is the iteration after that. The latter does not need to block the former one.

I don't think we should develop like this way...As you mentioned above, SPARK-33114 can be considered as a subtask of SPARK-25299. So how can we consider this PR as a first iteration when SPARK-25299 is still under discussion and development, especially when people haven't reached an agreement on the solution and has a possible alternative solution at the same time? Also, I think the custom shuffle manager isn't officially supported by Spark because the ShuffleManager interface is private. So it doesn't make sense for Spark to add an internal API for un-official use cases if there's no strong reason.

SPARK-31801 is surely big. But as I mentioned early, we can split it. When the solution is finalized, we can start with refactoring MapStatus first. I think it would be a much smaller task and be enough for your case. And then, we'll start the remaining work(e.g. use the new MapStatus where it was referenced) but you don't care.

I understand you have paid a lot of effort into this work, and sorry we can not get it in fast. And, unfortunately, I don't have the permission to merge. You could persuade committers to merge the PR if you insist on it.

Yeah, it is also good idea if we could split SPARK-31801 and start with refactoring MapStatus first. Do you or the community get ideas about how to split SPARK-31801?

We're still discussing the solution in #30763. So I can't tell you the concrete split plan. But, I think, we'd be able to start with refactoring MapStatus either way.

I see, I missed the latest discussion in #30763, will check there as well, thanks!

SparkQA · 2021-03-07T21:51:23Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40428/

SparkQA · 2021-03-07T21:56:31Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40428/

SparkQA · 2021-03-07T23:57:01Z

Test build #135846 has finished for PR 31763 at commit 798addf.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-04-16T11:10:51Z

Test build #137478 has finished for PR 31763 at commit 798addf.

This patch fails PySpark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

SparkQA · 2021-07-05T10:10:25Z

Test build #140641 has finished for PR 31763 at commit 798addf.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

github-actions · 2021-10-14T00:09:43Z

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

hiboyang added 2 commits March 5, 2021 23:18

Apply apache#30004

36cf66c

Update per comments

1309378

github-actions bot added the CORE label Mar 6, 2021

hiboyang mentioned this pull request Mar 6, 2021

[SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager #30004

Closed

hiboyang changed the title ~~Map status metadata2~~ [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager Mar 7, 2021

attilapiros reviewed Mar 7, 2021

View reviewed changes

Ngone51 reviewed Mar 7, 2021

View reviewed changes

hiboyang added 4 commits March 7, 2021 12:47

Rename getAllMapOutputStatuses to getAllMapOutputStatusMetadata

17bcff2

Fix test issues

525973b

Fix comment of return value for getAllMapOutputStatusMetadata

1fbc7a3

Fix import style issue

798addf

github-actions bot added the Stale label Oct 14, 2021

github-actions bot closed this Oct 15, 2021

[SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager #31763

[SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager #31763

Uh oh!

Conversation

hiboyang commented Mar 6, 2021

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

SparkQA commented Mar 6, 2021

Uh oh!

maropu commented Mar 6, 2021

Uh oh!

hiboyang commented Mar 7, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

attilapiros Mar 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hiboyang Mar 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Mar 7, 2021

Uh oh!

SparkQA commented Mar 7, 2021

Uh oh!

SparkQA commented Mar 7, 2021

Uh oh!

SparkQA commented Apr 16, 2021

Uh oh!

SparkQA commented Jul 5, 2021

Uh oh!

github-actions bot commented Oct 14, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

attilapiros Mar 7, 2021 •

edited

Loading

hiboyang Mar 10, 2021 •

edited

Loading