diff --git a/docs/sql-data-sources-hive-tables.md b/docs/sql-data-sources-hive-tables.md index e4ce3e938b75..f99b06494934 100644 --- a/docs/sql-data-sources-hive-tables.md +++ b/docs/sql-data-sources-hive-tables.md @@ -88,17 +88,17 @@ creating table, you can create a table using storage handler at Hive side, and u
inputFormat, outputFormatInputFormat and OutputFormat class as a string literal,
+ e.g. org.apache.hadoop.hive.ql.io.orc.OrcInputFormat. These 2 options must be appeared in a pair, and you can not
+ specify them if you already specified the fileFormat option.
serdefileFormat option is specified, do not specify this option
+ if the given fileFormat already include the information of serde. Currently "sequencefile", "textfile" and "rcfile"
don't include the serde information and you can use this option with these 3 fileFormats.
FROM clause of a SQL query can be used.
For example, instead of a full table you could also use a subquery in parentheses. It is not
- allowed to specify `dbtable` and `query` options at the same time.
+ allowed to specify dbtable and query options at the same time.
SELECT <columns> FROM (<user_specified_query>) spark_gen_aliasdbtable and query options at the same time. query and partitionColumn options at the same time. When specifying
+ partitionColumn option is required, the subquery can be specified using dbtable option instead and
+ partition columns can be qualified using the subquery alias provided as part of dbtable.
spark.read.format("jdbc")
diff --git a/docs/sql-data-sources-parquet.md b/docs/sql-data-sources-parquet.md
index b5309870f485..53a1111cd828 100644
--- a/docs/sql-data-sources-parquet.md
+++ b/docs/sql-data-sources-parquet.md
@@ -280,12 +280,12 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession
spark.sql.parquet.compression.codec
snappy
- Sets the compression codec used when writing Parquet files. If either `compression` or
- `parquet.compression` is specified in the table-specific options/properties, the precedence would be
- `compression`, `parquet.compression`, `spark.sql.parquet.compression.codec`. Acceptable values include:
+ Sets the compression codec used when writing Parquet files. If either compression or
+ parquet.compression is specified in the table-specific options/properties, the precedence would be
+ compression, parquet.compression, spark.sql.parquet.compression.codec. Acceptable values include:
none, uncompressed, snappy, gzip, lzo, brotli, lz4, zstd.
- Note that `zstd` requires `ZStandardCodec` to be installed before Hadoop 2.9.0, `brotli` requires
- `BrotliCodec` to be installed.
+ Note that zstd requires ZStandardCodec to be installed before Hadoop 2.9.0, brotli requires
+ BrotliCodec to be installed.
hint: the
+ number of Spark tasks will be approximately minPartitions. It can be less or more depending on
rounding errors or Kafka partitions that didn't receive any new data.group.id) that are generated by structured streaming
queries. If "kafka.group.id" is set, this option will be ignored.withWatermark() as by
the modes semantics, rows can be added to the Result Table only once after they are
finalized (i.e. after watermark is crossed). See the
Late Data section for more details.
@@ -2324,7 +2324,7 @@ Here are the different kinds of triggers that are supported.