diff --git a/docs/sql-data-sources-json.md b/docs/sql-data-sources-json.md index 0f1ca432b704..041512918e61 100644 --- a/docs/sql-data-sources-json.md +++ b/docs/sql-data-sources-json.md @@ -94,3 +94,168 @@ SELECT * FROM jsonTable + +## Data Source Option + +Data source options of JSON can be set via: +* the `.option`/`.options` methods of + * `DataFrameReader` + * `DataFrameWriter` + * `DataStreamReader` + * `DataStreamWriter` + +
| Property Name | Default | Meaning | Scope |
|---|---|---|---|
timeZone |
+ None | +Sets the string that indicates a time zone ID to be used to format timestamps in the JSON datasources or partition values. The following formats of timeZone are supported:+
spark.sql.session.timeZone is used by default.
+ |
+ read/write | +
primitivesAsString |
+ None | +Infers all primitive values as a string type. If None is set, it uses the default value, false. |
+ read | +
prefersDecimal |
+ None | +Infers all floating-point values as a decimal type. If the values do not fit in decimal, then it infers them as doubles. If None is set, it uses the default value, false. |
+ read | +
allowComments |
+ None | +Ignores Java/C++ style comment in JSON records. If None is set, it uses the default value, false |
+ read | +
allowUnquotedFieldNames |
+ None | +Allows unquoted JSON field names. If None is set, it uses the default value, false. |
+ read | +
allowSingleQuotes |
+ None | +Allows single quotes in addition to double quotes. If None is set, it uses the default value, true. |
+ read | +
allowNumericLeadingZero |
+ None | +Allows leading zeros in numbers (e.g. 00012). If None is set, it uses the default value, false. |
+ read | +
allowBackslashEscapingAnyCharacter |
+ None | +Allows accepting quoting of all character using backslash quoting mechanism. If None is set, it uses the default value, false. |
+ read | +
mode |
+ None | +Allows a mode for dealing with corrupt records during parsing. If None is set, it uses the default value, PERMISSIVE+
|
+ read | +
columnNameOfCorruptRecord |
+ None | +Allows renaming the new field having malformed string created by PERMISSIVE mode. This overrides spark.sql.columnNameOfCorruptRecord. If None is set, it uses the value specified in spark.sql.columnNameOfCorruptRecord. |
+ read | +
dateFormat |
+ None | +Sets the string that indicates a date format. Custom date formats follow the formats at datetime pattern. This applies to date type. If None is set, it uses the default value, yyyy-MM-dd. |
+ read/write | +
timestampFormat |
+ None | +Sets the string that indicates a timestamp format. Custom date formats follow the formats at datetime pattern. This applies to timestamp type. If None is set, it uses the default value, yyyy-MM-dd'T'HH:mm:ss[.SSS][XXX]. |
+ read/write | +
multiLine |
+ None | +Parse one record, which may span multiple lines, per file. If None is set, it uses the default value, false. |
+ read | +
allowUnquotedControlChars |
+ None | +Allows JSON Strings to contain unquoted control characters (ASCII characters with value less than 32, including tab and line feed characters) or not. | +read | +
encoding |
+ None | +For reading, allows to forcibly set one of standard basic or extended encoding for the JSON files. For example UTF-16BE, UTF-32LE. If None is set, the encoding of input JSON will be detected automatically when the multiLine option is set to true. For writing, Specifies encoding (charset) of saved json files. If None is set, the default UTF-8 charset will be used. |
+ read/write | +
lineSep |
+ None | +Defines the line separator that should be used for parsing. If None is set, it covers all \r, \r\n and \n. |
+ read/write | +
samplingRatio |
+ None | +Defines fraction of input JSON objects used for schema inferring. If None is set, it uses the default value, 1.0. |
+ read | +
dropFieldIfAllNull |
+ None | +Whether to ignore column of all null values or empty array/struct during schema inference. If None is set, it uses the default value, false. |
+ read | +
locale |
+ None | +Sets a locale as language tag in IETF BCP 47 format. If None is set, it uses the default value, en-US. For instance, locale is used while parsing dates and timestamps. |
+ read | +
allowNonNumericNumbers |
+ None | +Allows JSON parser to recognize set of “Not-a-Number” (NaN) tokens as legal floating number values. If None is set, it uses the default value, true.+
|
+ read | +
compression |
+ None | +Compression codec to use when saving to file. This can be one of the known case-insensitive shorten names (none, bzip2, gzip, lz4, snappy and deflate). | +write | +
ignoreNullFields |
+ None | +Whether to ignore null fields when generating JSON objects. If None is set, it uses the default value, true. |
+ write | +
org.apache.hadoop.fs.GlobFilter.
- * It does not change the behavior of partition discovery.org.apache.hadoop.fs.GlobFilter.
- * It does not change the behavior of partition discovery.