Skip to content

Commit 58776d6

Browse files
gengliangwangrdblue
authored andcommitted
[SPARK-24275][SQL] Revise doc comments in InputPartition
## What changes were proposed in this pull request? In apache#21145, DataReaderFactory is renamed to InputPartition. This PR is to revise wording in the comments to make it more clear. ## How was this patch tested? None Author: Gengliang Wang <[email protected]> Closes apache#21326 from gengliangwang/revise_reader_comments.
1 parent cae6048 commit 58776d6

File tree

7 files changed

+24
-23
lines changed

7 files changed

+24
-23
lines changed

sql/core/src/main/java/org/apache/spark/sql/sources/v2/ReadSupport.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ public interface ReadSupport extends DataSourceV2 {
3030
/**
3131
* Creates a {@link DataSourceReader} to scan the data from this data source.
3232
*
33-
* If this method fails (by throwing an exception), the action would fail and no Spark job was
33+
* If this method fails (by throwing an exception), the action will fail and no Spark job will be
3434
* submitted.
3535
*
3636
* @param options the options for the returned data source reader, which is an immutable

sql/core/src/main/java/org/apache/spark/sql/sources/v2/ReadSupportWithSchema.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ public interface ReadSupportWithSchema extends DataSourceV2 {
3535
/**
3636
* Create a {@link DataSourceReader} to scan the data from this data source.
3737
*
38-
* If this method fails (by throwing an exception), the action would fail and no Spark job was
38+
* If this method fails (by throwing an exception), the action will fail and no Spark job will be
3939
* submitted.
4040
*
4141
* @param schema the full schema of this data source reader. Full schema usually maps to the

sql/core/src/main/java/org/apache/spark/sql/sources/v2/WriteSupport.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ public interface WriteSupport extends DataSourceV2 {
3535
* Creates an optional {@link DataSourceWriter} to save the data to this data source. Data
3636
* sources can return None if there is no writing needed to be done according to the save mode.
3737
*
38-
* If this method fails (by throwing an exception), the action would fail and no Spark job was
38+
* If this method fails (by throwing an exception), the action will fail and no Spark job will be
3939
* submitted.
4040
*
4141
* @param jobId A unique string for the writing job. It's possible that there are many writing

sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/DataSourceReader.java

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@
3131
* {@link ReadSupport#createReader(DataSourceOptions)} or
3232
* {@link ReadSupportWithSchema#createReader(StructType, DataSourceOptions)}.
3333
* It can mix in various query optimization interfaces to speed up the data scan. The actual scan
34-
* logic is delegated to {@link InputPartition}s that are returned by
34+
* logic is delegated to {@link InputPartition}s, which are returned by
3535
* {@link #planInputPartitions()}.
3636
*
3737
* There are mainly 3 kinds of query optimizations:
@@ -42,8 +42,8 @@
4242
* 3. Special scans. E.g, columnar scan, unsafe row scan, etc.
4343
* Names of these interfaces start with `SupportsScan`.
4444
*
45-
* If an exception was throw when applying any of these query optimizations, the action would fail
46-
* and no Spark job was submitted.
45+
* If an exception was throw when applying any of these query optimizations, the action will fail
46+
* and no Spark job will be submitted.
4747
*
4848
* Spark first applies all operator push-down optimizations that this data source supports. Then
4949
* Spark collects information this data source reported for further optimizations. Finally Spark
@@ -56,21 +56,21 @@ public interface DataSourceReader {
5656
* Returns the actual schema of this data source reader, which may be different from the physical
5757
* schema of the underlying storage, as column pruning or other optimizations may happen.
5858
*
59-
* If this method fails (by throwing an exception), the action would fail and no Spark job was
59+
* If this method fails (by throwing an exception), the action will fail and no Spark job will be
6060
* submitted.
6161
*/
6262
StructType readSchema();
6363

6464
/**
65-
* Returns a list of read tasks. Each task is responsible for creating a data reader to
66-
* output data for one RDD partition. That means the number of tasks returned here is same as
67-
* the number of RDD partitions this scan outputs.
65+
* Returns a list of {@link InputPartition}s. Each {@link InputPartition} is responsible for
66+
* creating a data reader to output data of one RDD partition. The number of input partitions
67+
* returned here is the same as the number of RDD partitions this scan outputs.
6868
*
6969
* Note that, this may not be a full scan if the data source reader mixes in other optimization
7070
* interfaces like column pruning, filter push-down, etc. These optimizations are applied before
7171
* Spark issues the scan request.
7272
*
73-
* If this method fails (by throwing an exception), the action would fail and no Spark job was
73+
* If this method fails (by throwing an exception), the action will fail and no Spark job will be
7474
* submitted.
7575
*/
7676
List<InputPartition<Row>> planInputPartitions();

sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/InputPartition.java

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -23,13 +23,14 @@
2323

2424
/**
2525
* An input partition returned by {@link DataSourceReader#planInputPartitions()} and is
26-
* responsible for creating the actual data reader. The relationship between
27-
* {@link InputPartition} and {@link InputPartitionReader}
26+
* responsible for creating the actual data reader of one RDD partition.
27+
* The relationship between {@link InputPartition} and {@link InputPartitionReader}
2828
* is similar to the relationship between {@link Iterable} and {@link java.util.Iterator}.
2929
*
30-
* Note that input partitions will be serialized and sent to executors, then the partition reader
31-
* will be created on executors and do the actual reading. So {@link InputPartition} must be
32-
* serializable and {@link InputPartitionReader} doesn't need to be.
30+
* Note that {@link InputPartition}s will be serialized and sent to executors, then
31+
* {@link InputPartitionReader}s will be created on executors to do the actual reading. So
32+
* {@link InputPartition} must be serializable while {@link InputPartitionReader} doesn't need to
33+
* be.
3334
*/
3435
@InterfaceStability.Evolving
3536
public interface InputPartition<T> extends Serializable {
@@ -41,10 +42,10 @@ public interface InputPartition<T> extends Serializable {
4142
* The location is a string representing the host name.
4243
*
4344
* Note that if a host name cannot be recognized by Spark, it will be ignored as it was not in
44-
* the returned locations. By default this method returns empty string array, which means this
45-
* task has no location preference.
45+
* the returned locations. The default return value is empty string array, which means this
46+
* input partition's reader has no location preference.
4647
*
47-
* If this method fails (by throwing an exception), the action would fail and no Spark job was
48+
* If this method fails (by throwing an exception), the action will fail and no Spark job will be
4849
* submitted.
4950
*/
5051
default String[] preferredLocations() {

sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceWriter.java

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,8 @@
3030
* It can mix in various writing optimization interfaces to speed up the data saving. The actual
3131
* writing logic is delegated to {@link DataWriter}.
3232
*
33-
* If an exception was throw when applying any of these writing optimizations, the action would fail
34-
* and no Spark job was submitted.
33+
* If an exception was throw when applying any of these writing optimizations, the action will fail
34+
* and no Spark job will be submitted.
3535
*
3636
* The writing procedure is:
3737
* 1. Create a writer factory by {@link #createWriterFactory()}, serialize and send it to all the
@@ -54,7 +54,7 @@ public interface DataSourceWriter {
5454
/**
5555
* Creates a writer factory which will be serialized and sent to executors.
5656
*
57-
* If this method fails (by throwing an exception), the action would fail and no Spark job was
57+
* If this method fails (by throwing an exception), the action will fail and no Spark job will be
5858
* submitted.
5959
*/
6060
DataWriterFactory<Row> createWriterFactory();

sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataWriterFactory.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ public interface DataWriterFactory<T> extends Serializable {
3535
/**
3636
* Returns a data writer to do the actual writing work.
3737
*
38-
* If this method fails (by throwing an exception), the action would fail and no Spark job was
38+
* If this method fails (by throwing an exception), the action will fail and no Spark job will be
3939
* submitted.
4040
*
4141
* @param partitionId A unique id of the RDD partition that the returned writer will process.

0 commit comments

Comments
 (0)