Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions docs/sql-data-sources-hive-tables.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,17 +88,17 @@ creating table, you can create a table using storage handler at Hive side, and u
<tr>
<td><code>inputFormat, outputFormat</code></td>
<td>
These 2 options specify the name of a corresponding `InputFormat` and `OutputFormat` class as a string literal,
e.g. `org.apache.hadoop.hive.ql.io.orc.OrcInputFormat`. These 2 options must be appeared in a pair, and you can not
specify them if you already specified the `fileFormat` option.
These 2 options specify the name of a corresponding <code>InputFormat</code> and <code>OutputFormat</code> class as a string literal,
e.g. <code>org.apache.hadoop.hive.ql.io.orc.OrcInputFormat</code>. These 2 options must be appeared in a pair, and you can not
specify them if you already specified the <code>fileFormat</code> option.
</td>
</tr>

<tr>
<td><code>serde</code></td>
<td>
This option specifies the name of a serde class. When the `fileFormat` option is specified, do not specify this option
if the given `fileFormat` already include the information of serde. Currently "sequencefile", "textfile" and "rcfile"
This option specifies the name of a serde class. When the <code>fileFormat</code> option is specified, do not specify this option
if the given <code>fileFormat</code> already include the information of serde. Currently "sequencefile", "textfile" and "rcfile"
don't include the serde information and you can use this option with these 3 fileFormats.
</td>
</tr>
Expand Down
10 changes: 5 additions & 5 deletions docs/sql-data-sources-jdbc.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ the following case-insensitive options:
The JDBC table that should be read from or written into. Note that when using it in the read
path anything that is valid in a <code>FROM</code> clause of a SQL query can be used.
For example, instead of a full table you could also use a subquery in parentheses. It is not
allowed to specify `dbtable` and `query` options at the same time.
allowed to specify <code>dbtable</code> and <code>query</code> options at the same time.
</td>
</tr>
<tr>
Expand All @@ -72,10 +72,10 @@ the following case-insensitive options:
<code> SELECT &lt;columns&gt; FROM (&lt;user_specified_query&gt;) spark_gen_alias</code><br><br>
Below are a couple of restrictions while using this option.<br>
<ol>
<li> It is not allowed to specify `dbtable` and `query` options at the same time. </li>
<li> It is not allowed to specify `query` and `partitionColumn` options at the same time. When specifying
`partitionColumn` option is required, the subquery can be specified using `dbtable` option instead and
partition columns can be qualified using the subquery alias provided as part of `dbtable`. <br>
<li> It is not allowed to specify <code>dbtable</code> and <code>query</code> options at the same time. </li>
<li> It is not allowed to specify <code>query</code> and <code>partitionColumn</code> options at the same time. When specifying
<code>partitionColumn</code> option is required, the subquery can be specified using <code>dbtable</code> option instead and
partition columns can be qualified using the subquery alias provided as part of <code>dbtable</code>. <br>
Example:<br>
<code>
spark.read.format("jdbc")<br>
Expand Down
10 changes: 5 additions & 5 deletions docs/sql-data-sources-parquet.md
Original file line number Diff line number Diff line change
Expand Up @@ -280,12 +280,12 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession
<td><code>spark.sql.parquet.compression.codec</code></td>
<td>snappy</td>
<td>
Sets the compression codec used when writing Parquet files. If either `compression` or
`parquet.compression` is specified in the table-specific options/properties, the precedence would be
`compression`, `parquet.compression`, `spark.sql.parquet.compression.codec`. Acceptable values include:
Sets the compression codec used when writing Parquet files. If either <code>compression</code> or
<code>parquet.compression</code> is specified in the table-specific options/properties, the precedence would be
<code>compression</code>, <code>parquet.compression</code>, <code>spark.sql.parquet.compression.codec</code>. Acceptable values include:
none, uncompressed, snappy, gzip, lzo, brotli, lz4, zstd.
Note that `zstd` requires `ZStandardCodec` to be installed before Hadoop 2.9.0, `brotli` requires
`BrotliCodec` to be installed.
Note that <code>zstd</code> requires <code>ZStandardCodec</code> to be installed before Hadoop 2.9.0, <code>brotli</code> requires
<code>BrotliCodec</code> to be installed.
</td>
</tr>
<tr>
Expand Down
6 changes: 3 additions & 3 deletions docs/structured-streaming-kafka-integration.md
Original file line number Diff line number Diff line change
Expand Up @@ -473,16 +473,16 @@ The following configurations are optional:
<td>Desired minimum number of partitions to read from Kafka.
By default, Spark has a 1-1 mapping of topicPartitions to Spark partitions consuming from Kafka.
If you set this option to a value greater than your topicPartitions, Spark will divvy up large
Kafka partitions to smaller pieces. Please note that this configuration is like a `hint`: the
number of Spark tasks will be **approximately** `minPartitions`. It can be less or more depending on
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix format error

Kafka partitions to smaller pieces. Please note that this configuration is like a <code>hint</code>: the
number of Spark tasks will be <strong>approximately</strong> <code>minPartitions</code>. It can be less or more depending on
rounding errors or Kafka partitions that didn't receive any new data.</td>
</tr>
<tr>
<td>groupIdPrefix</td>
<td>string</td>
<td>spark-kafka-source</td>
<td>streaming and batch</td>
<td>Prefix of consumer group identifiers (`group.id`) that are generated by structured streaming
<td>Prefix of consumer group identifiers (<code>group.id</code>) that are generated by structured streaming
queries. If "kafka.group.id" is set, this option will be ignored.</td>
</tr>
<tr>
Expand Down
4 changes: 2 additions & 2 deletions docs/structured-streaming-programming-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -1717,7 +1717,7 @@ Here is the compatibility matrix.
<td style="vertical-align: middle;">Append, Update, Complete</td>
<td>
Append mode uses watermark to drop old aggregation state. But the output of a
windowed aggregation is delayed the late threshold specified in `withWatermark()` as by
windowed aggregation is delayed the late threshold specified in <code>withWatermark()</code> as by
the modes semantics, rows can be added to the Result Table only once after they are
finalized (i.e. after watermark is crossed). See the
<a href="#handling-late-data-and-watermarking">Late Data</a> section for more details.
Expand Down Expand Up @@ -2324,7 +2324,7 @@ Here are the different kinds of triggers that are supported.
<tr>
<td><b>One-time micro-batch</b></td>
<td>
The query will execute *only one* micro-batch to process all the available data and then
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix format error

The query will execute <strong>only one</strong> micro-batch to process all the available data and then
stop on its own. This is useful in scenarios you want to periodically spin up a cluster,
process everything that is available since the last period, and then shutdown the
cluster. In some case, this may lead to significant cost savings.
Expand Down