Various changes to the API and state assumptions in writers.#8
Conversation
Proposes the following changes to the API: - closeAndGetLength() is split into separate close() and getNumBytesWritten() operations. - openChannel and openStream renamed to toChannel and toStream Proposes the following changes to the implementation: - close() in the default implementation now persists the length in the partitionLengths array - getNumBytesWritten() doesn't necessitate the writer's resources to be closed ahead of it - Don't close the stream in BypassMergeSortShuffleWriter - only close it in DefaultShufflePartitionWriter#close (for consistency with how we treat channels)
| * Note that this stream itself is not closed by the caller; close the stream in | ||
| * the implementation of this class's {@link #close()}.. | ||
| */ | ||
| OutputStream toStream() throws IOException; |
There was a problem hiding this comment.
Renamed from openStream -> toStream in order to better indicate that this writer is still responsible for closing its own resources. Meaning, "convert this writer to a stream", in a sense, rather than, "Open a stream to write contents". But there might be a better naming convention here.
There was a problem hiding this comment.
I see how the conversion logic can be advantageous. +1
| throw new IllegalStateException("Failed to calculate position of file channel", e); | ||
| } | ||
| } else if (stream != null) { | ||
| return stream.getCount(); |
There was a problem hiding this comment.
Observe that flush isn't strictly necessary here. Getting the count retrieves the view of the number of bytes written by the counting output stream, which is correct.
There was a problem hiding this comment.
Well, you are calling stream.close() below (before calling getNumBytesWritten) so it will call flush via that, no?
There was a problem hiding this comment.
Not necessarily - see BypassMergeSortShuffleWriter. In BypassMergeSortShuffleWriter we call getNumBytesWritten after transferring the bytes from the spill files to the output writer, before the writer is closed. I think we're effectively counting on the idea that getNumBytesWritten telling the truth about how many bytes were actually written to the streams / channels it gives back, which seems reasonable enough - the method does what it says it does.
| * <p> | ||
| * Note that the default version of {@link #toChannel()} returns a {@link WritableByteChannel} | ||
| * that does not itself need to be closed up front; only the underlying output stream given by | ||
| * {@link #toStream()} must be closed. |
There was a problem hiding this comment.
nit: Do we need to mention this in the API doc? Seems like a comment we can add to the implementation class.
There was a problem hiding this comment.
"default implementation" here refers to the default method we put above.
| lengths[i] = writer.closeAndGetLength(); | ||
| if (file.exists() && !file.delete()) { | ||
| logger.error("Unable to delete file for partition {}", i); | ||
| lengths[i] = writer.getNumBytesWritten(); |
There was a problem hiding this comment.
uhh i think this is still in the try block? should move this outside the try block since we want to get the num-bytes after closing the writer
| initChannel(); | ||
| currChannelPosition = outputFileChannel.position(); | ||
| return outputFileChannel; | ||
| public void close() throws IOException { |
There was a problem hiding this comment.
I think it might be good to keep track of the outputFileChannel close as well. Basically, we should ensure that this method is called before getNumBytesWrittern() is called. Otherwise, I might actually be in favor of having close() return the long to ensure that getNumBytesWritten() can never be called before closing.
| * respecting the directory's hierarchy. | ||
| * | ||
| * Only streaming (openStream) is supported. | ||
| * Only streaming (toStream) is supported. |
## What changes were proposed in this pull request? This PR aims at improving the way physical plans are explained in spark. Currently, the explain output for physical plan may look very cluttered and each operator's string representation can be very wide and wraps around in the display making it little hard to follow. This especially happens when explaining a query 1) Operating on wide tables 2) Has complex expressions etc. This PR attempts to split the output into two sections. In the header section, we display the basic operator tree with a number associated with each operator. In this section, we strictly control what we output for each operator. In the footer section, each operator is verbosely displayed. Based on the feedback from Maryann, the uncorrelated subqueries (SubqueryExecs) are not included in the main plan. They are printed separately after the main plan and can be correlated by the originating expression id from its parent plan. To illustrate, here is a simple plan displayed in old vs new way. Example query1 : ``` EXPLAIN SELECT key, Max(val) FROM explain_temp1 WHERE key > 0 GROUP BY key HAVING max(val) > 0 ``` Old : ``` *(2) Project [key#2, max(val)#15] +- *(2) Filter (isnotnull(max(val#3)#18) AND (max(val#3)#18 > 0)) +- *(2) HashAggregate(keys=[key#2], functions=[max(val#3)], output=[key#2, max(val)#15, max(val#3)#18]) +- Exchange hashpartitioning(key#2, 200) +- *(1) HashAggregate(keys=[key#2], functions=[partial_max(val#3)], output=[key#2, max#21]) +- *(1) Project [key#2, val#3] +- *(1) Filter (isnotnull(key#2) AND (key#2 > 0)) +- *(1) FileScan parquet default.explain_temp1[key#2,val#3] Batched: true, DataFilters: [isnotnull(key#2), (key#2 > 0)], Format: Parquet, Location: InMemoryFileIndex[file:/user/hive/warehouse/explain_temp1], PartitionFilters: [], PushedFilters: [IsNotNull(key), GreaterThan(key,0)], ReadSchema: struct<key:int,val:int> ``` New : ``` Project (8) +- Filter (7) +- HashAggregate (6) +- Exchange (5) +- HashAggregate (4) +- Project (3) +- Filter (2) +- Scan parquet default.explain_temp1 (1) (1) Scan parquet default.explain_temp1 [codegen id : 1] Output: [key#2, val#3] (2) Filter [codegen id : 1] Input : [key#2, val#3] Condition : (isnotnull(key#2) AND (key#2 > 0)) (3) Project [codegen id : 1] Output : [key#2, val#3] Input : [key#2, val#3] (4) HashAggregate [codegen id : 1] Input: [key#2, val#3] (5) Exchange Input: [key#2, max#11] (6) HashAggregate [codegen id : 2] Input: [key#2, max#11] (7) Filter [codegen id : 2] Input : [key#2, max(val)#5, max(val#3)#8] Condition : (isnotnull(max(val#3)#8) AND (max(val#3)#8 > 0)) (8) Project [codegen id : 2] Output : [key#2, max(val)#5] Input : [key#2, max(val)#5, max(val#3)#8] ``` Example Query2 (subquery): ``` SELECT * FROM explain_temp1 WHERE KEY = (SELECT Max(KEY) FROM explain_temp2 WHERE KEY = (SELECT Max(KEY) FROM explain_temp3 WHERE val > 0) AND val = 2) AND val > 3 ``` Old: ``` *(1) Project [key#2, val#3] +- *(1) Filter (((isnotnull(KEY#2) AND isnotnull(val#3)) AND (KEY#2 = Subquery scalar-subquery#39)) AND (val#3 > 3)) : +- Subquery scalar-subquery#39 : +- *(2) HashAggregate(keys=[], functions=[max(KEY#26)], output=[max(KEY)apache-spark-on-k8s#45]) : +- Exchange SinglePartition : +- *(1) HashAggregate(keys=[], functions=[partial_max(KEY#26)], output=[max#47]) : +- *(1) Project [key#26] : +- *(1) Filter (((isnotnull(KEY#26) AND isnotnull(val#27)) AND (KEY#26 = Subquery scalar-subquery#38)) AND (val#27 = 2)) : : +- Subquery scalar-subquery#38 : : +- *(2) HashAggregate(keys=[], functions=[max(KEY#28)], output=[max(KEY)apache-spark-on-k8s#43]) : : +- Exchange SinglePartition : : +- *(1) HashAggregate(keys=[], functions=[partial_max(KEY#28)], output=[max#49]) : : +- *(1) Project [key#28] : : +- *(1) Filter (isnotnull(val#29) AND (val#29 > 0)) : : +- *(1) FileScan parquet default.explain_temp3[key#28,val#29] Batched: true, DataFilters: [isnotnull(val#29), (val#29 > 0)], Format: Parquet, Location: InMemoryFileIndex[file:/user/hive/warehouse/explain_temp3], PartitionFilters: [], PushedFilters: [IsNotNull(val), GreaterThan(val,0)], ReadSchema: struct<key:int,val:int> : +- *(1) FileScan parquet default.explain_temp2[key#26,val#27] Batched: true, DataFilters: [isnotnull(key#26), isnotnull(val#27), (val#27 = 2)], Format: Parquet, Location: InMemoryFileIndex[file:/user/hive/warehouse/explain_temp2], PartitionFilters: [], PushedFilters: [IsNotNull(key), IsNotNull(val), EqualTo(val,2)], ReadSchema: struct<key:int,val:int> +- *(1) FileScan parquet default.explain_temp1[key#2,val#3] Batched: true, DataFilters: [isnotnull(key#2), isnotnull(val#3), (val#3 > 3)], Format: Parquet, Location: InMemoryFileIndex[file:/user/hive/warehouse/explain_temp1], PartitionFilters: [], PushedFilters: [IsNotNull(key), IsNotNull(val), GreaterThan(val,3)], ReadSchema: struct<key:int,val:int> ``` New: ``` Project (3) +- Filter (2) +- Scan parquet default.explain_temp1 (1) (1) Scan parquet default.explain_temp1 [codegen id : 1] Output: [key#2, val#3] (2) Filter [codegen id : 1] Input : [key#2, val#3] Condition : (((isnotnull(KEY#2) AND isnotnull(val#3)) AND (KEY#2 = Subquery scalar-subquery#23)) AND (val#3 > 3)) (3) Project [codegen id : 1] Output : [key#2, val#3] Input : [key#2, val#3] ===== Subqueries ===== Subquery:1 Hosting operator id = 2 Hosting Expression = Subquery scalar-subquery#23 HashAggregate (9) +- Exchange (8) +- HashAggregate (7) +- Project (6) +- Filter (5) +- Scan parquet default.explain_temp2 (4) (4) Scan parquet default.explain_temp2 [codegen id : 1] Output: [key#26, val#27] (5) Filter [codegen id : 1] Input : [key#26, val#27] Condition : (((isnotnull(KEY#26) AND isnotnull(val#27)) AND (KEY#26 = Subquery scalar-subquery#22)) AND (val#27 = 2)) (6) Project [codegen id : 1] Output : [key#26] Input : [key#26, val#27] (7) HashAggregate [codegen id : 1] Input: [key#26] (8) Exchange Input: [max#35] (9) HashAggregate [codegen id : 2] Input: [max#35] Subquery:2 Hosting operator id = 5 Hosting Expression = Subquery scalar-subquery#22 HashAggregate (15) +- Exchange (14) +- HashAggregate (13) +- Project (12) +- Filter (11) +- Scan parquet default.explain_temp3 (10) (10) Scan parquet default.explain_temp3 [codegen id : 1] Output: [key#28, val#29] (11) Filter [codegen id : 1] Input : [key#28, val#29] Condition : (isnotnull(val#29) AND (val#29 > 0)) (12) Project [codegen id : 1] Output : [key#28] Input : [key#28, val#29] (13) HashAggregate [codegen id : 1] Input: [key#28] (14) Exchange Input: [max#37] (15) HashAggregate [codegen id : 2] Input: [max#37] ``` Note: I opened this PR as a WIP to start getting feedback. I will be on vacation starting tomorrow would not be able to immediately incorporate the feedback. I will start to work on them as soon as i can. Also, currently this PR provides a basic infrastructure for explain enhancement. The details about individual operators will be implemented in follow-up prs ## How was this patch tested? Added a new test `explain.sql` that tests basic scenarios. Need to add more tests. Closes apache#24759 from dilipbiswal/explain_feature. Authored-by: Dilip Biswal <dbiswal@us.ibm.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Proposes the following changes to the API:
closeAndGetLength()is split into separateclose()andgetNumBytesWritten()operations.openChannelandopenStreamrenamed totoChannelandtoStreamProposes the following changes to the implementation:
close()in the default implementation now persists the length in thepartitionLengthsarraygetNumBytesWritten()doesn't necessitate the writer's resources to be closed ahead of itBypassMergeSortShuffleWriter- only close it inDefaultShufflePartitionWriter#close(for consistency with how we treat channels)