-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-31801][API][SHUFFLE] Register map output metadata #30763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Only invoke shuffle output tracker once per unregister shuffle attempt.
Cause we need to stop the SC
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #132771 has finished for PR 30763 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #134568 has finished for PR 30763 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #135856 has finished for PR 30763 at commit
|
|
Failure is totally unrealted:
|
|
Let me move the mima excludes from 3.1.x to 3.2.x. |
|
Regarding cutting this to smaller pieces I can identify two potential sub-PRs:
I can do this cut if you think it is really needed and if you agree with the content of the sub-PRs. |
|
Test build #135864 has finished for PR 30763 at commit
|
Ngone51
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @attilapiros @hiboyang I'd like to discuss more about the way to register the metadata before we consider how to split this PR.
I actually have a different idea about this. As mentioned shortly in @hiboyang 's PR, I'd rather redesign the location of MapStatus:
spark/core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala
Lines 36 to 38 in f340857
| private[spark] sealed trait MapStatus { | |
| /** Location where this task output is. */ | |
| def location: BlockManagerId |
If we want to introduce custom storage, I think the location should be abstracted to represent different storages, e.g., we can introduce class Location. And the Location would have some common attributes, e.g., type. And we can add metadata as an interface to Location to provide arbitrary infos. Then, BlockManagerId would be a native implementation for Spark and users could implement Location to support custom storage.
Also, this way wouldn't change the existing framework of shuffle read/write. It'd allow us to be able to reuse the features and code paths without extra effort.
WDYT?
To test the idea I try to come up with hard situations but this does not mean I am against the idea. So if I understand correctly In this case we should check the references of this As the current reader uses For example as I see On the other hand write side might be easier as there MapStatus is filled with the id of the current block manager. So a new writer implementation just uses its location. But for the read side my worry is having runtime checks/assert/guards to enforce when allowed to use what. |
|
I still think the location abstraction is good idea. I just have my doubts about the amount of the efforts we need to do:
|
Yes
We don't. Actually, I think it also answers this question:
Acutally, I only find one reference that need cast:
And yes the custom reader should care more about casting. They should definitely cast the generic |
As far as I see,
|
|
BTW, I'm thinking we still need to think carefully about the |
|
Just see the discussion here. The location abstraction is a good idea. For different shuffle solutions, they could have different location implementation, e.g. Spark's default sort shuffle has BlockManagerId as the location, remote shuffle service has shuffle servers as the location, disaggregated shuffle storage (e.g. S3) has S3 bucket/path as the location.
|
|
@Ngone51 @attilapiros do we want to proceed with the |
|
I'm waiting for @attilapiros 's feedback. |
|
We are all agree more abstraction here is really a good idea and reading #30763 (comment) gives me the impression we both worry about the impact of the change but as I see you have solutions for all the concerns: #30763 (comment). @Ngone51 I am fine if you proceed and when it is ready we can see the real price of this change. |
+1 |
|
Sure, I'll give a try these days. |
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
This is a copy of #28618 but merged with the current master resolving all the merge conflicts.
All the credit goes to @mccheah I just would like to help out here and avoid his progress to be lost.
What changes were proposed in this pull request?
Adds a
ShuffleOutputTrackerAPI that can be used for managing shuffle metadata on the driver. Accepts map output metadata returned by the map output writers.Requires #28616.
Why are the changes needed?
Part of the design as discussed in this document, and part of the wider effort of SPARK-25299.
Does this PR introduce any user-facing change?
Enables additional APIs for the shuffle storage plugin tree. Usage will become more apparent when the read side of the shuffle plugin tree is introduced.
How was this patch tested?
We've added a mock implementation of the shuffle plugin tree here, to prove that a Spark job using a different implementation of the plugin can use all of the plugin points for an alternative shuffle data storage solution. But we don't include it here, in order to minimize the diff and the code to review in this specific patch. See #28902.