Skip to content

Commit d542802

Browse files
committed
update doc and programming guide
1 parent 89cd384 commit d542802

File tree

6 files changed

+28
-22
lines changed

6 files changed

+28
-22
lines changed

R/pkg/R/SQLContext.R

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -332,8 +332,10 @@ setMethod("toDF", signature(x = "RDD"),
332332

333333
#' Create a SparkDataFrame from a JSON file.
334334
#'
335-
#' Loads a JSON file (\href{http://jsonlines.org/}{JSON Lines text format or newline-delimited JSON}
336-
#' ), returning the result as a SparkDataFrame
335+
#' Loads a JSON file, returning the result as a SparkDataFrame
336+
#' By default, (\href{http://jsonlines.org/}{JSON Lines text format or newline-delimited JSON}
337+
#' ) is supported. For JSON (one record per file), set a named paramter \code{wholeFile} to
338+
#' \code{true}.
337339
#' It goes through the entire dataset once to determine the schema.
338340
#'
339341
#' @param path Path of file to read. A vector of multiple paths is allowed.
@@ -785,7 +787,7 @@ dropTempView <- function(viewName) {
785787
#' df1 <- read.df("path/to/file.json", source = "json")
786788
#' schema <- structType(structField("name", "string"),
787789
#' structField("info", "map<string,double>"))
788-
#' df2 <- read.df(mapTypeJsonPath, "json", schema)
790+
#' df2 <- read.df(mapTypeJsonPath, "json", schema, wholeFile = "true")
789791
#' df3 <- loadDF("data/test_table", "parquet", mergeSchema = "true")
790792
#' }
791793
#' @name read.df

docs/sql-programming-guide.md

Lines changed: 15 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -386,8 +386,8 @@ For example:
386386

387387
The [built-in DataFrames functions](api/scala/index.html#org.apache.spark.sql.functions$) provide common
388388
aggregations such as `count()`, `countDistinct()`, `avg()`, `max()`, `min()`, etc.
389-
While those functions are designed for DataFrames, Spark SQL also has type-safe versions for some of them in
390-
[Scala](api/scala/index.html#org.apache.spark.sql.expressions.scalalang.typed$) and
389+
While those functions are designed for DataFrames, Spark SQL also has type-safe versions for some of them in
390+
[Scala](api/scala/index.html#org.apache.spark.sql.expressions.scalalang.typed$) and
391391
[Java](api/java/org/apache/spark/sql/expressions/javalang/typed.html) to work with strongly typed Datasets.
392392
Moreover, users are not limited to the predefined aggregate functions and can create their own.
393393

@@ -397,7 +397,7 @@ Moreover, users are not limited to the predefined aggregate functions and can cr
397397

398398
<div data-lang="scala" markdown="1">
399399

400-
Users have to extend the [UserDefinedAggregateFunction](api/scala/index.html#org.apache.spark.sql.expressions.UserDefinedAggregateFunction)
400+
Users have to extend the [UserDefinedAggregateFunction](api/scala/index.html#org.apache.spark.sql.expressions.UserDefinedAggregateFunction)
401401
abstract class to implement a custom untyped aggregate function. For example, a user-defined average
402402
can look like:
403403

@@ -888,8 +888,9 @@ or a JSON file.
888888

889889
Note that the file that is offered as _a json file_ is not a typical JSON file. Each
890890
line must contain a separate, self-contained valid JSON object. For more information, please see
891-
[JSON Lines text format, also called newline-delimited JSON](http://jsonlines.org/). As a
892-
consequence, a regular multi-line JSON file will most often fail.
891+
[JSON Lines text format, also called newline-delimited JSON](http://jsonlines.org/).
892+
893+
For a regular multi-line JSON file, set the `wholeFile` option to `true`.
893894

894895
{% include_example json_dataset scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala %}
895896
</div>
@@ -901,8 +902,9 @@ or a JSON file.
901902

902903
Note that the file that is offered as _a json file_ is not a typical JSON file. Each
903904
line must contain a separate, self-contained valid JSON object. For more information, please see
904-
[JSON Lines text format, also called newline-delimited JSON](http://jsonlines.org/). As a
905-
consequence, a regular multi-line JSON file will most often fail.
905+
[JSON Lines text format, also called newline-delimited JSON](http://jsonlines.org/).
906+
907+
For a regular multi-line JSON file, set the `wholeFile` option to `true`.
906908

907909
{% include_example json_dataset java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java %}
908910
</div>
@@ -913,8 +915,9 @@ This conversion can be done using `SparkSession.read.json` on a JSON file.
913915

914916
Note that the file that is offered as _a json file_ is not a typical JSON file. Each
915917
line must contain a separate, self-contained valid JSON object. For more information, please see
916-
[JSON Lines text format, also called newline-delimited JSON](http://jsonlines.org/). As a
917-
consequence, a regular multi-line JSON file will most often fail.
918+
[JSON Lines text format, also called newline-delimited JSON](http://jsonlines.org/).
919+
920+
For a regular multi-line JSON file, set the `wholeFile` parameter to `true`.
918921

919922
{% include_example json_dataset python/sql/datasource.py %}
920923
</div>
@@ -926,8 +929,9 @@ files is a JSON object.
926929

927930
Note that the file that is offered as _a json file_ is not a typical JSON file. Each
928931
line must contain a separate, self-contained valid JSON object. For more information, please see
929-
[JSON Lines text format, also called newline-delimited JSON](http://jsonlines.org/). As a
930-
consequence, a regular multi-line JSON file will most often fail.
932+
[JSON Lines text format, also called newline-delimited JSON](http://jsonlines.org/).
933+
934+
For a regular multi-line JSON file, add a named parameter `wholeFile` to `true`.
931935

932936
{% include_example json_dataset r/RSparkSQLExample.R %}
933937

python/pyspark/sql/readwriter.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -163,8 +163,8 @@ def json(self, path, schema=None, primitivesAsString=None, prefersDecimal=None,
163163
"""
164164
Loads a JSON file and returns the results as a :class:`DataFrame`.
165165
166-
Both JSON (one record per file) and `JSON Lines <http://jsonlines.org/>`_
167-
(newline-delimited JSON) are supported and can be selected with the `wholeFile` parameter.
166+
`JSON Lines <http://jsonlines.org/>`_(newline-delimited JSON) is supported by default.
167+
For JSON (one record per file), set the `wholeFile` parameter to ``true``.
168168
169169
If the ``schema`` parameter is not specified, this function goes
170170
through the input once to determine the input schema.

python/pyspark/sql/streaming.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -433,8 +433,8 @@ def json(self, path, schema=None, primitivesAsString=None, prefersDecimal=None,
433433
"""
434434
Loads a JSON file stream and returns the results as a :class:`DataFrame`.
435435
436-
Both JSON (one record per file) and `JSON Lines <http://jsonlines.org/>`_
437-
(newline-delimited JSON) are supported and can be selected with the `wholeFile` parameter.
436+
`JSON Lines <http://jsonlines.org/>`_(newline-delimited JSON) is supported by default.
437+
For JSON (one record per file), set the `wholeFile` parameter to ``true``.
438438
439439
If the ``schema`` parameter is not specified, this function goes
440440
through the input once to determine the input schema.

sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -263,8 +263,8 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
263263
/**
264264
* Loads a JSON file and returns the results as a `DataFrame`.
265265
*
266-
* Both JSON (one record per file) and <a href="http://jsonlines.org/">JSON Lines</a>
267-
* (newline-delimited JSON) are supported and can be selected with the `wholeFile` option.
266+
* <a href="http://jsonlines.org/">JSON Lines</a> (newline-delimited JSON) is supported by
267+
* default. For JSON (one record per file), set the `wholeFile` option to true.
268268
*
269269
* This function goes through the input once to determine the input schema. If you know the
270270
* schema in advance, use the version that specifies the schema to avoid the extra scan.

sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -143,8 +143,8 @@ final class DataStreamReader private[sql](sparkSession: SparkSession) extends Lo
143143
/**
144144
* Loads a JSON file stream and returns the results as a `DataFrame`.
145145
*
146-
* Both JSON (one record per file) and <a href="http://jsonlines.org/">JSON Lines</a>
147-
* (newline-delimited JSON) are supported and can be selected with the `wholeFile` option.
146+
* <a href="http://jsonlines.org/">JSON Lines</a> (newline-delimited JSON) is supported by
147+
* default. For JSON (one record per file), set the `wholeFile` option to true.
148148
*
149149
* This function goes through the input once to determine the input schema. If you know the
150150
* schema in advance, use the version that specifies the schema to avoid the extra scan.

0 commit comments

Comments
 (0)