[SPARK-24275][SQL] Revise doc comments in InputPartition

gengliangwang · rdblue · commit 58776d6d9cdd · 2018-08-29T12:59:21.000-07:00
## What changes were proposed in this pull request? In apache#21145, DataReaderFactory is renamed to InputPartition. This PR is to revise wording in the comments to make it more clear. ## How was this patch tested? None Author: Gengliang Wang <gengliang.wang@databricks.com> Closes apache#21326 from gengliangwang/revise_reader_comments.
diff --git a/sql/core/src/main/java/org/apache/spark/sql/sources/v2/ReadSupport.java b/sql/core/src/main/java/org/apache/spark/sql/sources/v2/ReadSupport.java
@@ -30,7 +30,7 @@ public interface ReadSupport extends DataSourceV2 {
   /**
    * Creates a {@link DataSourceReader} to scan the data from this data source.
    *
-   * If this method fails (by throwing an exception), the action would fail and no Spark job was
+   * If this method fails (by throwing an exception), the action will fail and no Spark job will be
    * submitted.
    *
    * @param options the options for the returned data source reader, which is an immutable
diff --git a/sql/core/src/main/java/org/apache/spark/sql/sources/v2/ReadSupportWithSchema.java b/sql/core/src/main/java/org/apache/spark/sql/sources/v2/ReadSupportWithSchema.java
@@ -35,7 +35,7 @@ public interface ReadSupportWithSchema extends DataSourceV2 {
   /**
    * Create a {@link DataSourceReader} to scan the data from this data source.
    *
-   * If this method fails (by throwing an exception), the action would fail and no Spark job was
+   * If this method fails (by throwing an exception), the action will fail and no Spark job will be
    * submitted.
    *
    * @param schema the full schema of this data source reader. Full schema usually maps to the
diff --git a/sql/core/src/main/java/org/apache/spark/sql/sources/v2/WriteSupport.java b/sql/core/src/main/java/org/apache/spark/sql/sources/v2/WriteSupport.java
@@ -35,7 +35,7 @@ public interface WriteSupport extends DataSourceV2 {
    * Creates an optional {@link DataSourceWriter} to save the data to this data source. Data
    * sources can return None if there is no writing needed to be done according to the save mode.
    *
-   * If this method fails (by throwing an exception), the action would fail and no Spark job was
+   * If this method fails (by throwing an exception), the action will fail and no Spark job will be
    * submitted.
    *
    * @param jobId A unique string for the writing job. It's possible that there are many writing
diff --git a/sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/DataSourceReader.java b/sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/DataSourceReader.java
@@ -31,7 +31,7 @@
  * {@link ReadSupport#createReader(DataSourceOptions)} or
  * {@link ReadSupportWithSchema#createReader(StructType, DataSourceOptions)}.
  * It can mix in various query optimization interfaces to speed up the data scan. The actual scan
- * logic is delegated to {@link InputPartition}s that are returned by
+ * logic is delegated to {@link InputPartition}s, which are returned by
  * {@link #planInputPartitions()}.
  *
  * There are mainly 3 kinds of query optimizations:
@@ -42,8 +42,8 @@
  *   3. Special scans. E.g, columnar scan, unsafe row scan, etc.
  *      Names of these interfaces start with `SupportsScan`.
  *
- * If an exception was throw when applying any of these query optimizations, the action would fail
- * and no Spark job was submitted.
+ * If an exception was throw when applying any of these query optimizations, the action will fail
+ * and no Spark job will be submitted.
  *
  * Spark first applies all operator push-down optimizations that this data source supports. Then
  * Spark collects information this data source reported for further optimizations. Finally Spark
@@ -56,21 +56,21 @@ public interface DataSourceReader {
    * Returns the actual schema of this data source reader, which may be different from the physical
    * schema of the underlying storage, as column pruning or other optimizations may happen.
    *
-   * If this method fails (by throwing an exception), the action would fail and no Spark job was
+   * If this method fails (by throwing an exception), the action will fail and no Spark job will be
    * submitted.
    */
   StructType readSchema();
 
   /**
-   * Returns a list of read tasks. Each task is responsible for creating a data reader to
-   * output data for one RDD partition. That means the number of tasks returned here is same as
-   * the number of RDD partitions this scan outputs.
+   * Returns a list of {@link InputPartition}s. Each {@link InputPartition} is responsible for
+   * creating a data reader to output data of one RDD partition. The number of input partitions
+   * returned here is the same as the number of RDD partitions this scan outputs.
    *
    * Note that, this may not be a full scan if the data source reader mixes in other optimization
    * interfaces like column pruning, filter push-down, etc. These optimizations are applied before
    * Spark issues the scan request.
    *
-   * If this method fails (by throwing an exception), the action would fail and no Spark job was
+   * If this method fails (by throwing an exception), the action will fail and no Spark job will be
    * submitted.
    */
   List<InputPartition<Row>> planInputPartitions();
diff --git a/sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/InputPartition.java b/sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/InputPartition.java
@@ -23,13 +23,14 @@
 
 /**
  * An input partition returned by {@link DataSourceReader#planInputPartitions()} and is
- * responsible for creating the actual data reader. The relationship between
- * {@link InputPartition} and {@link InputPartitionReader}
+ * responsible for creating the actual data reader of one RDD partition.
+ * The relationship between {@link InputPartition} and {@link InputPartitionReader}
  * is similar to the relationship between {@link Iterable} and {@link java.util.Iterator}.
  *
- * Note that input partitions will be serialized and sent to executors, then the partition reader
- * will be created on executors and do the actual reading. So {@link InputPartition} must be
- * serializable and {@link InputPartitionReader} doesn't need to be.
+ * Note that {@link InputPartition}s will be serialized and sent to executors, then
+ * {@link InputPartitionReader}s will be created on executors to do the actual reading. So
+ * {@link InputPartition} must be serializable while {@link InputPartitionReader} doesn't need to
+ * be.
  */
 @InterfaceStability.Evolving
 public interface InputPartition<T> extends Serializable {
@@ -41,10 +42,10 @@ public interface InputPartition<T> extends Serializable {
    * The location is a string representing the host name.
    *
    * Note that if a host name cannot be recognized by Spark, it will be ignored as it was not in
-   * the returned locations. By default this method returns empty string array, which means this
-   * task has no location preference.
+   * the returned locations. The default return value is empty string array, which means this
+   * input partition's reader has no location preference.
    *
-   * If this method fails (by throwing an exception), the action would fail and no Spark job was
+   * If this method fails (by throwing an exception), the action will fail and no Spark job will be
    * submitted.
    */
   default String[] preferredLocations() {
diff --git a/sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceWriter.java b/sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceWriter.java
@@ -30,8 +30,8 @@
  * It can mix in various writing optimization interfaces to speed up the data saving. The actual
  * writing logic is delegated to {@link DataWriter}.
  *
- * If an exception was throw when applying any of these writing optimizations, the action would fail
- * and no Spark job was submitted.
+ * If an exception was throw when applying any of these writing optimizations, the action will fail
+ * and no Spark job will be submitted.
  *
  * The writing procedure is:
  *   1. Create a writer factory by {@link #createWriterFactory()}, serialize and send it to all the
@@ -54,7 +54,7 @@ public interface DataSourceWriter {
   /**
    * Creates a writer factory which will be serialized and sent to executors.
    *
-   * If this method fails (by throwing an exception), the action would fail and no Spark job was
+   * If this method fails (by throwing an exception), the action will fail and no Spark job will be
    * submitted.
    */
   DataWriterFactory<Row> createWriterFactory();
diff --git a/sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataWriterFactory.java b/sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataWriterFactory.java
@@ -35,7 +35,7 @@ public interface DataWriterFactory<T> extends Serializable {
   /**
    * Returns a data writer to do the actual writing work.
    *
-   * If this method fails (by throwing an exception), the action would fail and no Spark job was
+   * If this method fails (by throwing an exception), the action will fail and no Spark job will be
    * submitted.
    *
    * @param partitionId A unique id of the RDD partition that the returned writer will process.

Original file line number	Diff line number	Diff line change
`@@ -30,7 +30,7 @@ public interface ReadSupport extends DataSourceV2 {`
`30`	`30`	`/**`
`31`	`31`	`* Creates a {@link DataSourceReader} to scan the data from this data source.`
`32`	`32`	`*`
`33`		`- * If this method fails (by throwing an exception), the action would fail and no Spark job was`
	`33`	`+ * If this method fails (by throwing an exception), the action will fail and no Spark job will be`
`34`	`34`	`* submitted.`
`35`	`35`	`*`
`36`	`36`	`* @param options the options for the returned data source reader, which is an immutable`
Original file line number	Diff line number	Diff line change
`@@ -35,7 +35,7 @@ public interface ReadSupportWithSchema extends DataSourceV2 {`
`35`	`35`	`/**`
`36`	`36`	`* Create a {@link DataSourceReader} to scan the data from this data source.`
`37`	`37`	`*`
`38`		`- * If this method fails (by throwing an exception), the action would fail and no Spark job was`
	`38`	`+ * If this method fails (by throwing an exception), the action will fail and no Spark job will be`
`39`	`39`	`* submitted.`
`40`	`40`	`*`
`41`	`41`	`* @param schema the full schema of this data source reader. Full schema usually maps to the`
Original file line number	Diff line number	Diff line change
`@@ -35,7 +35,7 @@ public interface WriteSupport extends DataSourceV2 {`
`35`	`35`	`* Creates an optional {@link DataSourceWriter} to save the data to this data source. Data`
`36`	`36`	`* sources can return None if there is no writing needed to be done according to the save mode.`
`37`	`37`	`*`
`38`		`- * If this method fails (by throwing an exception), the action would fail and no Spark job was`
	`38`	`+ * If this method fails (by throwing an exception), the action will fail and no Spark job will be`
`39`	`39`	`* submitted.`
`40`	`40`	`*`
`41`	`41`	`* @param jobId A unique string for the writing job. It's possible that there are many writing`
Original file line number	Diff line number	Diff line change
`@@ -30,8 +30,8 @@`
`30`	`30`	`* It can mix in various writing optimization interfaces to speed up the data saving. The actual`
`31`	`31`	`* writing logic is delegated to {@link DataWriter}.`
`32`	`32`	`*`
`33`		`- * If an exception was throw when applying any of these writing optimizations, the action would fail`
`34`		`- * and no Spark job was submitted.`
	`33`	`+ * If an exception was throw when applying any of these writing optimizations, the action will fail`
	`34`	`+ * and no Spark job will be submitted.`
`35`	`35`	`*`
`36`	`36`	`* The writing procedure is:`
`37`	`37`	`* 1. Create a writer factory by {@link #createWriterFactory()}, serialize and send it to all the`
`@@ -54,7 +54,7 @@ public interface DataSourceWriter {`
`54`	`54`	`/**`
`55`	`55`	`* Creates a writer factory which will be serialized and sent to executors.`
`56`	`56`	`*`
`57`		`- * If this method fails (by throwing an exception), the action would fail and no Spark job was`
	`57`	`+ * If this method fails (by throwing an exception), the action will fail and no Spark job will be`
`58`	`58`	`* submitted.`
`59`	`59`	`*/`
`60`	`60`	`DataWriterFactory<Row> createWriterFactory();`
Original file line number	Diff line number	Diff line change
`@@ -35,7 +35,7 @@ public interface DataWriterFactory<T> extends Serializable {`
`35`	`35`	`/**`
`36`	`36`	`* Returns a data writer to do the actual writing work.`
`37`	`37`	`*`
`38`		`- * If this method fails (by throwing an exception), the action would fail and no Spark job was`
	`38`	`+ * If this method fails (by throwing an exception), the action will fail and no Spark job will be`
`39`	`39`	`* submitted.`
`40`	`40`	`*`
`41`	`41`	`* @param partitionId A unique id of the RDD partition that the returned writer will process.`