Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions docs/_data/menu-sql.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -171,22 +171,22 @@
url: sql-ref-syntax-qry-select-limit.html
- text: Common Table Expression
url: sql-ref-syntax-qry-select-cte.html
- text: Hints
url: sql-ref-syntax-qry-select-hints.html
- text: Inline Table
url: sql-ref-syntax-qry-select-inline-table.html
- text: JOIN
url: sql-ref-syntax-qry-select-join.html
- text: Join Hints
url: sql-ref-syntax-qry-select-hints.html
- text: LIKE Predicate
url: sql-ref-syntax-qry-select-like.html
- text: Set Operators
url: sql-ref-syntax-qry-select-setops.html
- text: TABLESAMPLE
url: sql-ref-syntax-qry-sampling.html
url: sql-ref-syntax-qry-select-sampling.html
- text: Table-valued Function
url: sql-ref-syntax-qry-select-tvf.html
- text: Window Function
url: sql-ref-syntax-qry-window.html
url: sql-ref-syntax-qry-select-window.html
- text: EXPLAIN
url: sql-ref-syntax-qry-explain.html
- text: Auxiliary Statements
Expand Down
4 changes: 3 additions & 1 deletion docs/sql-performance-tuning.md
Original file line number Diff line number Diff line change
Expand Up @@ -179,7 +179,7 @@ SELECT /*+ BROADCAST(r) */ * FROM records r JOIN src s ON r.key = s.key
</div>
</div>

For more details please refer to the documentation of [Join Hints](sql-ref-syntax-qry-select-hints.html).
For more details please refer to the documentation of [Join Hints](sql-ref-syntax-qry-select-hints.html#join-hints).

## Coalesce Hints for SQL Queries

Expand All @@ -196,6 +196,8 @@ The "REPARTITION_BY_RANGE" hint must have column names and a partition number is
SELECT /*+ REPARTITION_BY_RANGE(c) */ * FROM t
SELECT /*+ REPARTITION_BY_RANGE(3, c) */ * FROM t

For more details please refer to the documentation of [Partitioning Hints](sql-ref-syntax-qry-select-hints.html#partitioning-hints).

## Adaptive Query Execution
Adaptive Query Execution (AQE) is an optimization technique in Spark SQL that makes use of the runtime statistics to choose the most efficient query execution plan. AQE is disabled by default. Spark SQL can use the umbrella configuration of `spark.sql.adaptive.enabled` to control whether turn it on/off. As of Spark 3.0, there are three major features in AQE, including coalescing post-shuffle partitions, converting sort-merge join to broadcast join, and skew join optimization.

Expand Down
83 changes: 77 additions & 6 deletions docs/sql-ref-syntax-qry-select-hints.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
layout: global
title: Join Hints
displayTitle: Join Hints
title: Hints
displayTitle: Hints
license: |
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
Expand All @@ -21,15 +21,86 @@ license: |

### Description

Join Hints allow users to suggest the join strategy that Spark should use. Prior to Spark 3.0, only the `BROADCAST` Join Hint was supported. `MERGE`, `SHUFFLE_HASH` and `SHUFFLE_REPLICATE_NL` Joint Hints support was added in 3.0. When different join strategy hints are specified on both sides of a join, Spark prioritizes hints in the following order: `BROADCAST` over `MERGE` over `SHUFFLE_HASH` over `SHUFFLE_REPLICATE_NL`. When both sides are specified with the `BROADCAST` hint or the `SHUFFLE_HASH` hint, Spark will pick the build side based on the join type and the sizes of the relations. Since a given strategy may not support all join types, Spark is not guaranteed to use the join strategy suggested by the hint.
Hints give users a way to suggest how Spark SQL to use specific approaches to generate its execution plan.

### Syntax

```sql
/*+ join_hint [ , ... ] */
/*+ hint [ , ... ] */
```

### Join Hints Types
### Partitioning Hints

Partitioning hints allow users to suggest a partitioning stragety that Spark should follow. `COALESCE`, `REPARTITION`,
and `REPARTITION_BY_RANGE` hints are supported and are equivalent to `coalesce`, `repartition`, and
`repartitionByRange` [Dataset APIs](api/scala/org/apache/spark/sql/Dataset.html), respectively. These hints give users
a way to tune performance and control the number of output files in Spark SQL. When multiple partitioning hints are
specified, multiple nodes are inserted into the logical plan, but the leftmost hint is picked by the optimizer.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: The description of multiple hints is duplicated in https://github.com/apache/spark/pull/28672/files#diff-84ec3ee2cc31db6fd14e15058e35435cR69, maybe we just keep the one with the example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comment. I will keep the one in description.


#### Partitioning Hints Types

* **COALESCE**

The `COALESCE` hint can be used to reduce the number of partitions to the specified number of partitions. It takes a partition number as a parameter.

* **REPARTITION**

The `REPARTITION` hint can be used to repartition to the specified number of partitions using the specified partitioning expressions. It takes a partition number, column names, or both as parameters.

* **REPARTITION_BY_RANGE**

The `REPARTITION_BY_RANGE` hint can be used to repartition to the specified number of partitions using the specified partitioning expressions. It takes column names and an optional partition number as parameters.

#### Examples

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove the unencessary blank line.

```sql
SELECT /*+ COALESCE(3) */ * FROM t;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about showing a spark plan via explain?


SELECT /*+ REPARTITION(3) */ * FROM t;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need these statements having no output as the example?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more comment; probably, the join hint section should have the same format for the examples.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I will only have one example for explain. Otherwise the example section will be too long.
I will leave the join hint example section as is for now. Don't want this section to be too long.


SELECT /*+ REPARTITION(c) */ * FROM t;

SELECT /*+ REPARTITION(3, c) */ * FROM t;

SELECT /*+ REPARTITION_BY_RANGE(c) */ * FROM t;

SELECT /*+ REPARTITION_BY_RANGE(3, c) */ * FROM t;

-- multiple partitioning hints
EXPLAIN EXTENDED SELECT /*+ REPARTITION(100), COALESCE(500), REPARTITION_BY_RANGE(3, c) */ * FROM t;
== Parsed Logical Plan ==
'UnresolvedHint REPARTITION, [100]
+- 'UnresolvedHint COALESCE, [500]
+- 'UnresolvedHint REPARTITION_BY_RANGE, [3, 'c]
+- 'Project [*]
+- 'UnresolvedRelation [t]

== Analyzed Logical Plan ==
name: string, c: int
Repartition 100, true
+- Repartition 500, false
+- RepartitionByExpression [c#30 ASC NULLS FIRST], 3
+- Project [name#29, c#30]
+- SubqueryAlias spark_catalog.default.t
+- Relation[name#29,c#30] parquet

== Optimized Logical Plan ==
Repartition 100, true
+- Relation[name#29,c#30] parquet

== Physical Plan ==
Exchange RoundRobinPartitioning(100), false, [id=#121]
+- *(1) ColumnarToRow
+- FileScan parquet default.t[name#29,c#30] Batched: true, DataFilters: [], Format: Parquet,
Location: CatalogFileIndex[file:/spark/spark-warehouse/t], PartitionFilters: [],
PushedFilters: [], ReadSchema: struct<name:string>
```

### Join Hints

Join hints allow users to suggest the join strategy that Spark should use. Prior to Spark 3.0, only the `BROADCAST` Join Hint was supported. `MERGE`, `SHUFFLE_HASH` and `SHUFFLE_REPLICATE_NL` Joint Hints support was added in 3.0. When different join strategy hints are specified on both sides of a join, Spark prioritizes hints in the following order: `BROADCAST` over `MERGE` over `SHUFFLE_HASH` over `SHUFFLE_REPLICATE_NL`. When both sides are specified with the `BROADCAST` hint or the `SHUFFLE_HASH` hint, Spark will pick the build side based on the join type and the sizes of the relations. Since a given strategy may not support all join types, Spark is not guaranteed to use the join strategy suggested by the hint.

#### Join Hints Types

* **BROADCAST**

Expand All @@ -47,7 +118,7 @@ Join Hints allow users to suggest the join strategy that Spark should use. Prior

Suggests that Spark use shuffle-and-replicate nested loop join.

### Examples
#### Examples

```sql
-- Join Hints for broadcast join
Expand Down
2 changes: 1 addition & 1 deletion docs/sql-ref-syntax-qry-select-join.md
Original file line number Diff line number Diff line change
Expand Up @@ -235,4 +235,4 @@ SELECT * FROM employee ANTI JOIN department ON employee.deptno = department.dept
### Related Statements

* [SELECT](sql-ref-syntax-qry-select.html)
* [Join Hints](sql-ref-syntax-qry-select-hints.html)
* [Hints](sql-ref-syntax-qry-select-hints.html)
6 changes: 3 additions & 3 deletions docs/sql-ref-syntax-qry-select.md
Original file line number Diff line number Diff line change
Expand Up @@ -151,11 +151,11 @@ SELECT [ hints , ... ] [ ALL | DISTINCT ] { named_expression [ , ... ] }
* [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html)
* [LIMIT Clause](sql-ref-syntax-qry-select-limit.html)
* [Common Table Expression](sql-ref-syntax-qry-select-cte.html)
* [Hints](sql-ref-syntax-qry-select-hints.html)
* [Inline Table](sql-ref-syntax-qry-select-inline-table.html)
* [JOIN](sql-ref-syntax-qry-select-join.html)
* [Join Hints](sql-ref-syntax-qry-select-hints.html)
* [LIKE Predicate](sql-ref-syntax-qry-select-like.html)
* [Set Operators](sql-ref-syntax-qry-select-setops.html)
* [TABLESAMPLE](sql-ref-syntax-qry-sampling.html)
* [TABLESAMPLE](sql-ref-syntax-qry-select-sampling.html)
* [Table-valued Function](sql-ref-syntax-qry-select-tvf.html)
* [Window Function](sql-ref-syntax-qry-window.html)
* [Window Function](sql-ref-syntax-qry-select-window.html)
6 changes: 3 additions & 3 deletions docs/sql-ref-syntax-qry.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,12 +37,12 @@ ability to generate logical and physical plan for a given query using
* [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html)
* [LIMIT Clause](sql-ref-syntax-qry-select-limit.html)
* [Common Table Expression](sql-ref-syntax-qry-select-cte.html)
* [Hints](sql-ref-syntax-qry-select-hints.html)
* [Inline Table](sql-ref-syntax-qry-select-inline-table.html)
* [JOIN](sql-ref-syntax-qry-select-join.html)
* [Join Hints](sql-ref-syntax-qry-select-hints.html)
* [LIKE Predicate](sql-ref-syntax-qry-select-like.html)
* [Set Operators](sql-ref-syntax-qry-select-setops.html)
* [TABLESAMPLE](sql-ref-syntax-qry-sampling.html)
* [TABLESAMPLE](sql-ref-syntax-qry-select-sampling.html)
* [Table-valued Function](sql-ref-syntax-qry-select-tvf.html)
* [Window Function](sql-ref-syntax-qry-window.html)
* [Window Function](sql-ref-syntax-qry-select-window.html)
* [EXPLAIN Statement](sql-ref-syntax-qry-explain.html)
6 changes: 3 additions & 3 deletions docs/sql-ref-syntax.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,18 +54,18 @@ Spark SQL is Apache Spark's module for working with structured data. The SQL Syn
* [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html)
* [GROUP BY Clause](sql-ref-syntax-qry-select-groupby.html)
* [HAVING Clause](sql-ref-syntax-qry-select-having.html)
* [Hints](sql-ref-syntax-qry-select-hints.html)
* [Inline Table](sql-ref-syntax-qry-select-inline-table.html)
* [JOIN](sql-ref-syntax-qry-select-join.html)
* [Join Hints](sql-ref-syntax-qry-select-hints.html)
* [LIKE Predicate](sql-ref-syntax-qry-select-like.html)
* [LIMIT Clause](sql-ref-syntax-qry-select-limit.html)
* [ORDER BY Clause](sql-ref-syntax-qry-select-orderby.html)
* [Set Operators](sql-ref-syntax-qry-select-setops.html)
* [SORT BY Clause](sql-ref-syntax-qry-select-sortby.html)
* [TABLESAMPLE](sql-ref-syntax-qry-sampling.html)
* [TABLESAMPLE](sql-ref-syntax-qry-select-sampling.html)
* [Table-valued Function](sql-ref-syntax-qry-select-tvf.html)
* [WHERE Clause](sql-ref-syntax-qry-select-where.html)
* [Window Function](sql-ref-syntax-qry-window.html)
* [Window Function](sql-ref-syntax-qry-select-window.html)
* [EXPLAIN](sql-ref-syntax-qry-explain.html)

### Auxiliary Statements
Expand Down