-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-31798][SHUFFLE][API] Shuffle Writer API changes to return custom map output metadata #28616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
98cc9b7
8c689ba
90084ea
6ecd3ad
4fd056d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,64 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one or more | ||
| * contributor license agreements. See the NOTICE file distributed with | ||
| * this work for additional information regarding copyright ownership. | ||
| * The ASF licenses this file to You under the Apache License, Version 2.0 | ||
| * (the "License"); you may not use this file except in compliance with | ||
| * the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
|
|
||
| package org.apache.spark.shuffle.api.metadata; | ||
|
|
||
| import java.util.Optional; | ||
|
|
||
| import org.apache.spark.annotation.Private; | ||
|
|
||
| /** | ||
| * :: Private :: | ||
| * | ||
| * Represents the result of writing map outputs for a shuffle map task. | ||
| * <p> | ||
| * Partition lengths represents the length of each block written in the map task. This can | ||
| * be used for downstream readers to allocate resources, such as in-memory buffers. | ||
| * <p> | ||
| * Map output writers can choose to attach arbitrary metadata tags to register with a | ||
| * shuffle output tracker (a module that is currently yet to be built in a future | ||
| * iteration of the shuffle storage APIs). | ||
| */ | ||
| @Private | ||
| public final class MapOutputCommitMessage { | ||
|
|
||
| private final long[] partitionLengths; | ||
| private final Optional<MapOutputMetadata> mapOutputMetadata; | ||
|
|
||
| private MapOutputCommitMessage( | ||
| long[] partitionLengths, Optional<MapOutputMetadata> mapOutputMetadata) { | ||
| this.partitionLengths = partitionLengths; | ||
| this.mapOutputMetadata = mapOutputMetadata; | ||
| } | ||
|
|
||
| public static MapOutputCommitMessage of(long[] partitionLengths) { | ||
| return new MapOutputCommitMessage(partitionLengths, Optional.empty()); | ||
| } | ||
|
|
||
| public static MapOutputCommitMessage of( | ||
| long[] partitionLengths, MapOutputMetadata mapOutputMetadata) { | ||
| return new MapOutputCommitMessage(partitionLengths, Optional.of(mapOutputMetadata)); | ||
| } | ||
|
|
||
| public long[] getPartitionLengths() { | ||
| return partitionLengths; | ||
| } | ||
|
|
||
| public Optional<MapOutputMetadata> getMapOutputMetadata() { | ||
| return mapOutputMetadata; | ||
| } | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,31 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one or more | ||
| * contributor license agreements. See the NOTICE file distributed with | ||
| * this work for additional information regarding copyright ownership. | ||
| * The ASF licenses this file to You under the Apache License, Version 2.0 | ||
| * (the "License"); you may not use this file except in compliance with | ||
| * the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
|
|
||
| package org.apache.spark.shuffle.api.metadata; | ||
|
|
||
| import java.io.Serializable; | ||
|
|
||
| /** | ||
| * :: Private :: | ||
| * | ||
| * An opaque metadata tag for registering the result of committing the output of a | ||
| * shuffle map task. | ||
| * <p> | ||
| * All implementations must be serializable since this is sent from the executors to | ||
| * the driver. | ||
| */ | ||
| public interface MapOutputMetadata extends Serializable {} | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sorry for commenting on closed PR, looking at this to review newer pro - https://github.com/apache/spark/pull/28618/files - these should probably be annotated with @SInCE Also should these be @evolving or DeveloperApi vs Private? this by itself doesn't do any good and the intention is for people to be able to implement it right?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I roughly remember I asked the same thing to @squito before. The reason was that it's not stable yet (?) and presumably wants to test it internally before making an API .. I guess. |
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need rewrite above comment?