Various changes to the API and state assumptions in writers. by mccheah · Pull Request #8 · bloomberg/apache-spark-on-k8s

mccheah · 2019-03-25T23:22:12Z

Proposes the following changes to the API:

closeAndGetLength() is split into separate close() and getNumBytesWritten() operations.
openChannel and openStream renamed to toChannel and toStream

Proposes the following changes to the implementation:

close() in the default implementation now persists the length in the partitionLengths array
getNumBytesWritten() doesn't necessitate the writer's resources to be closed ahead of it
Don't close the stream in BypassMergeSortShuffleWriter - only close it in DefaultShufflePartitionWriter#close (for consistency with how we treat channels)

Proposes the following changes to the API: - closeAndGetLength() is split into separate close() and getNumBytesWritten() operations. - openChannel and openStream renamed to toChannel and toStream Proposes the following changes to the implementation: - close() in the default implementation now persists the length in the partitionLengths array - getNumBytesWritten() doesn't necessitate the writer's resources to be closed ahead of it - Don't close the stream in BypassMergeSortShuffleWriter - only close it in DefaultShufflePartitionWriter#close (for consistency with how we treat channels)

mccheah · 2019-03-25T23:24:21Z

core/src/main/java/org/apache/spark/api/shuffle/ShufflePartitionWriter.java

+   * Note that this stream itself is not closed by the caller; close the stream in
+   * the implementation of this class's {@link #close()}..
+   */
+  OutputStream toStream() throws IOException;


Renamed from openStream -> toStream in order to better indicate that this writer is still responsible for closing its own resources. Meaning, "convert this writer to a stream", in a sense, rather than, "Open a stream to write contents". But there might be a better naming convention here.

I see how the conversion logic can be advantageous. +1

mccheah · 2019-03-25T23:26:23Z

core/src/main/java/org/apache/spark/shuffle/sort/io/DefaultShuffleMapOutputWriter.java

+          throw new IllegalStateException("Failed to calculate position of file channel", e);
        }
+      } else if (stream != null) {
+        return stream.getCount();


Observe that flush isn't strictly necessary here. Getting the count retrieves the view of the number of bytes written by the counting output stream, which is correct.

Well, you are calling stream.close() below (before calling getNumBytesWritten) so it will call flush via that, no?

Not necessarily - see BypassMergeSortShuffleWriter. In BypassMergeSortShuffleWriter we call getNumBytesWritten after transferring the bytes from the spill files to the output writer, before the writer is closed. I think we're effectively counting on the idea that getNumBytesWritten telling the truth about how many bytes were actually written to the streams / channels it gives back, which seems reasonable enough - the method does what it says it does.

yifeih · 2019-03-25T23:32:30Z

core/src/main/java/org/apache/spark/api/shuffle/ShufflePartitionWriter.java

+   * <p>
+   * Note that the default version of {@link #toChannel()} returns a {@link WritableByteChannel}
+   * that does not itself need to be closed up front; only the underlying output stream given by
+   * {@link #toStream()} must be closed.


nit: Do we need to mention this in the API doc? Seems like a comment we can add to the implementation class.

"default implementation" here refers to the default method we put above.

yifeih · 2019-03-25T23:34:12Z

core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java

-        lengths[i] = writer.closeAndGetLength();
-        if (file.exists() && !file.delete()) {
-          logger.error("Unable to delete file for partition {}", i);
+          lengths[i] = writer.getNumBytesWritten();


uhh i think this is still in the try block? should move this outside the try block since we want to get the num-bytes after closing the writer

yifeih · 2019-03-25T23:37:56Z

core/src/main/java/org/apache/spark/shuffle/sort/io/DefaultShuffleMapOutputWriter.java

-      initChannel();
-      currChannelPosition = outputFileChannel.position();
-      return outputFileChannel;
+    public void close() throws IOException {


I think it might be good to keep track of the outputFileChannel close as well. Basically, we should ensure that this method is called before getNumBytesWrittern() is called. Otherwise, I might actually be in favor of having close() return the long to ensure that getNumBytesWritten() can never be called before closing.

mccheah · 2019-03-26T00:01:20Z

core/src/main/scala/org/apache/spark/rpc/netty/NettyStreamManager.scala

 *   respecting the directory's hierarchy.
 *
- * Only streaming (openStream) is supported.
+ * Only streaming (toStream) is supported.


Unintentional

ifilonenko

/lgtm

## What changes were proposed in this pull request? This PR aims at improving the way physical plans are explained in spark. Currently, the explain output for physical plan may look very cluttered and each operator's string representation can be very wide and wraps around in the display making it little hard to follow. This especially happens when explaining a query 1) Operating on wide tables 2) Has complex expressions etc. This PR attempts to split the output into two sections. In the header section, we display the basic operator tree with a number associated with each operator. In this section, we strictly control what we output for each operator. In the footer section, each operator is verbosely displayed. Based on the feedback from Maryann, the uncorrelated subqueries (SubqueryExecs) are not included in the main plan. They are printed separately after the main plan and can be correlated by the originating expression id from its parent plan. To illustrate, here is a simple plan displayed in old vs new way. Example query1 : ``` EXPLAIN SELECT key, Max(val) FROM explain_temp1 WHERE key > 0 GROUP BY key HAVING max(val) > 0 ``` Old : ``` *(2) Project [key#2, max(val)#15] +- *(2) Filter (isnotnull(max(val#3)#18) AND (max(val#3)#18 > 0)) +- *(2) HashAggregate(keys=[key#2], functions=[max(val#3)], output=[key#2, max(val)#15, max(val#3)#18]) +- Exchange hashpartitioning(key#2, 200) +- *(1) HashAggregate(keys=[key#2], functions=[partial_max(val#3)], output=[key#2, max#21]) +- *(1) Project [key#2, val#3] +- *(1) Filter (isnotnull(key#2) AND (key#2 > 0)) +- *(1) FileScan parquet default.explain_temp1[key#2,val#3] Batched: true, DataFilters: [isnotnull(key#2), (key#2 > 0)], Format: Parquet, Location: InMemoryFileIndex[file:/user/hive/warehouse/explain_temp1], PartitionFilters: [], PushedFilters: [IsNotNull(key), GreaterThan(key,0)], ReadSchema: struct<key:int,val:int> ``` New : ``` Project (8) +- Filter (7) +- HashAggregate (6) +- Exchange (5) +- HashAggregate (4) +- Project (3) +- Filter (2) +- Scan parquet default.explain_temp1 (1) (1) Scan parquet default.explain_temp1 [codegen id : 1] Output: [key#2, val#3] (2) Filter [codegen id : 1] Input : [key#2, val#3] Condition : (isnotnull(key#2) AND (key#2 > 0)) (3) Project [codegen id : 1] Output : [key#2, val#3] Input : [key#2, val#3] (4) HashAggregate [codegen id : 1] Input: [key#2, val#3] (5) Exchange Input: [key#2, max#11] (6) HashAggregate [codegen id : 2] Input: [key#2, max#11] (7) Filter [codegen id : 2] Input : [key#2, max(val)#5, max(val#3)#8] Condition : (isnotnull(max(val#3)#8) AND (max(val#3)#8 > 0)) (8) Project [codegen id : 2] Output : [key#2, max(val)#5] Input : [key#2, max(val)#5, max(val#3)#8] ``` Example Query2 (subquery): ``` SELECT * FROM explain_temp1 WHERE KEY = (SELECT Max(KEY) FROM explain_temp2 WHERE KEY = (SELECT Max(KEY) FROM explain_temp3 WHERE val > 0) AND val = 2) AND val > 3 ``` Old: ``` *(1) Project [key#2, val#3] +- *(1) Filter (((isnotnull(KEY#2) AND isnotnull(val#3)) AND (KEY#2 = Subquery scalar-subquery#39)) AND (val#3 > 3)) : +- Subquery scalar-subquery#39 : +- *(2) HashAggregate(keys=[], functions=[max(KEY#26)], output=[max(KEY)apache-spark-on-k8s#45]) : +- Exchange SinglePartition : +- *(1) HashAggregate(keys=[], functions=[partial_max(KEY#26)], output=[max#47]) : +- *(1) Project [key#26] : +- *(1) Filter (((isnotnull(KEY#26) AND isnotnull(val#27)) AND (KEY#26 = Subquery scalar-subquery#38)) AND (val#27 = 2)) : : +- Subquery scalar-subquery#38 : : +- *(2) HashAggregate(keys=[], functions=[max(KEY#28)], output=[max(KEY)apache-spark-on-k8s#43]) : : +- Exchange SinglePartition : : +- *(1) HashAggregate(keys=[], functions=[partial_max(KEY#28)], output=[max#49]) : : +- *(1) Project [key#28] : : +- *(1) Filter (isnotnull(val#29) AND (val#29 > 0)) : : +- *(1) FileScan parquet default.explain_temp3[key#28,val#29] Batched: true, DataFilters: [isnotnull(val#29), (val#29 > 0)], Format: Parquet, Location: InMemoryFileIndex[file:/user/hive/warehouse/explain_temp3], PartitionFilters: [], PushedFilters: [IsNotNull(val), GreaterThan(val,0)], ReadSchema: struct<key:int,val:int> : +- *(1) FileScan parquet default.explain_temp2[key#26,val#27] Batched: true, DataFilters: [isnotnull(key#26), isnotnull(val#27), (val#27 = 2)], Format: Parquet, Location: InMemoryFileIndex[file:/user/hive/warehouse/explain_temp2], PartitionFilters: [], PushedFilters: [IsNotNull(key), IsNotNull(val), EqualTo(val,2)], ReadSchema: struct<key:int,val:int> +- *(1) FileScan parquet default.explain_temp1[key#2,val#3] Batched: true, DataFilters: [isnotnull(key#2), isnotnull(val#3), (val#3 > 3)], Format: Parquet, Location: InMemoryFileIndex[file:/user/hive/warehouse/explain_temp1], PartitionFilters: [], PushedFilters: [IsNotNull(key), IsNotNull(val), GreaterThan(val,3)], ReadSchema: struct<key:int,val:int> ``` New: ``` Project (3) +- Filter (2) +- Scan parquet default.explain_temp1 (1) (1) Scan parquet default.explain_temp1 [codegen id : 1] Output: [key#2, val#3] (2) Filter [codegen id : 1] Input : [key#2, val#3] Condition : (((isnotnull(KEY#2) AND isnotnull(val#3)) AND (KEY#2 = Subquery scalar-subquery#23)) AND (val#3 > 3)) (3) Project [codegen id : 1] Output : [key#2, val#3] Input : [key#2, val#3] ===== Subqueries ===== Subquery:1 Hosting operator id = 2 Hosting Expression = Subquery scalar-subquery#23 HashAggregate (9) +- Exchange (8) +- HashAggregate (7) +- Project (6) +- Filter (5) +- Scan parquet default.explain_temp2 (4) (4) Scan parquet default.explain_temp2 [codegen id : 1] Output: [key#26, val#27] (5) Filter [codegen id : 1] Input : [key#26, val#27] Condition : (((isnotnull(KEY#26) AND isnotnull(val#27)) AND (KEY#26 = Subquery scalar-subquery#22)) AND (val#27 = 2)) (6) Project [codegen id : 1] Output : [key#26] Input : [key#26, val#27] (7) HashAggregate [codegen id : 1] Input: [key#26] (8) Exchange Input: [max#35] (9) HashAggregate [codegen id : 2] Input: [max#35] Subquery:2 Hosting operator id = 5 Hosting Expression = Subquery scalar-subquery#22 HashAggregate (15) +- Exchange (14) +- HashAggregate (13) +- Project (12) +- Filter (11) +- Scan parquet default.explain_temp3 (10) (10) Scan parquet default.explain_temp3 [codegen id : 1] Output: [key#28, val#29] (11) Filter [codegen id : 1] Input : [key#28, val#29] Condition : (isnotnull(val#29) AND (val#29 > 0)) (12) Project [codegen id : 1] Output : [key#28] Input : [key#28, val#29] (13) HashAggregate [codegen id : 1] Input: [key#28] (14) Exchange Input: [max#37] (15) HashAggregate [codegen id : 2] Input: [max#37] ``` Note: I opened this PR as a WIP to start getting feedback. I will be on vacation starting tomorrow would not be able to immediately incorporate the feedback. I will start to work on them as soon as i can. Also, currently this PR provides a basic infrastructure for explain enhancement. The details about individual operators will be implemented in follow-up prs ## How was this patch tested? Added a new test `explain.sql` that tests basic scenarios. Need to add more tests. Closes apache#24759 from dilipbiswal/explain_feature. Authored-by: Dilip Biswal <dbiswal@us.ibm.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

mccheah commented Mar 25, 2019

View reviewed changes

Hack around kernel bug

dd4656c

yifeih reviewed Mar 25, 2019

View reviewed changes

mccheah added 2 commits March 25, 2019 16:47

Set length only after closing

d032041

Flush partition writers in test

5a62c67

mccheah commented Mar 26, 2019

View reviewed changes

Revert rewording

d21feb2

ifilonenko approved these changes Mar 26, 2019

View reviewed changes

ifilonenko merged commit 9f6230b into bloomberg:spark-25299-master-2 Mar 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Various changes to the API and state assumptions in writers.#8

Various changes to the API and state assumptions in writers.#8
ifilonenko merged 5 commits intobloomberg:spark-25299-master-2from
palantir:propose-spark-25299-writer-changes

mccheah commented Mar 25, 2019

Uh oh!

mccheah Mar 25, 2019

Uh oh!

ifilonenko Mar 25, 2019

Uh oh!

mccheah Mar 25, 2019

Uh oh!

ifilonenko Mar 25, 2019

Uh oh!

mccheah Mar 25, 2019

Uh oh!

yifeih Mar 25, 2019

Uh oh!

mccheah Mar 25, 2019

Uh oh!

yifeih Mar 25, 2019

Uh oh!

yifeih Mar 25, 2019

Uh oh!

mccheah Mar 26, 2019

Uh oh!

ifilonenko left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mccheah commented Mar 25, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ifilonenko left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants