[SPARK-31866][SQL][DOCS] Add COALESCE/REPARTITION/REPARTITION_BY_RANGE Hints to SQL Reference #28672

huaxingao · 2020-05-29T05:13:41Z

What changes were proposed in this pull request?

Add Coalesce/Repartition/Repartition_By_Range Hints to SQL Reference

Why are the changes needed?

To make SQL reference complete

Does this PR introduce any user-facing change?

Only the the above pages are changed. The following two pages are the same as before.

How was this patch tested?

Manually build and check

…ion_By_Range Hints to SQL REF

huaxingao · 2020-05-29T05:17:30Z

Add Coalesce/Repartition/Repartition_By_Range Hints to SQL Reference per @gatorsmile request.
cc @maropu @dilipbiswal @ulysses-you @jzhuge @xuanyuanking

SparkQA · 2020-05-29T05:35:48Z

Test build #123263 has finished for PR 28672 at commit 0b3e765.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2020-05-29T06:19:33Z

This is for 3.0? Btw, could you assign a new jira ID to this PR?

maropu · 2020-05-29T06:20:41Z

docs/sql-ref-syntax-qry-select-hints.md

+/*+ hint [ , ... ] */
+```
+
+### Coalesce/Repartition/Repartition_By_Range Hints


How about simply saying ### Partitioning Hints here?

maropu · 2020-05-29T06:22:16Z

docs/sql-ref-syntax-qry-select-hints.md

+
+### Examples
+```sql
+SELECT /*+ COALESCE(3) */ * FROM t;


How about showing a spark plan via explain?

maropu · 2020-05-29T06:39:23Z

docs/sql-ref-syntax-qry-select-hints.md

+
+### Coalesce/Repartition/Repartition_By_Range Hints
+
+Coalesce/Repartition/Repartition_By_Range hints have functionalities equivalent to those of the


Could you follow the same format with the Join hints? e.g., Coalesce -> `COALESCE`

huaxingao · 2020-05-29T07:30:49Z

Yes, it's for 3.0. I created jira SPARK-31866. @maropu

SparkQA · 2020-05-29T07:45:39Z

Test build #123271 has finished for PR 28672 at commit 60fdb93.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2020-05-29T07:37:59Z

docs/sql-ref-syntax-qry-select-hints.md

+      Location: CatalogFileIndex[file:/spark/spark-warehouse/t], PartitionFilters: [],
+      PushedFilters: [], ReadSchema: struct<name:string>
+
+SELECT /*+ REPARTITION(3) */ * FROM t;


We still need these statements having no output as the example?

One more comment; probably, the join hint section should have the same format for the examples.

I think I will only have one example for explain. Otherwise the example section will be too long.
I will leave the join hint example section as is for now. Don't want this section to be too long.

maropu · 2020-05-29T07:39:14Z

docs/sql-ref-syntax-qry-select-hints.md

+
+### Join Hints
+
+Join Hints allow users to suggest the join strategy that Spark should use. Prior to Spark 3.0, only the `BROADCAST` Join Hint was supported. `MERGE`, `SHUFFLE_HASH` and `SHUFFLE_REPLICATE_NL` Joint Hints support was added in 3.0. When different join strategy hints are specified on both sides of a join, Spark prioritizes hints in the following order: `BROADCAST` over `MERGE` over `SHUFFLE_HASH` over `SHUFFLE_REPLICATE_NL`. When both sides are specified with the `BROADCAST` hint or the `SHUFFLE_HASH` hint, Spark will pick the build side based on the join type and the sizes of the relations. Since a given strategy may not support all join types, Spark is not guaranteed to use the join strategy suggested by the hint.


Hints -> hints?

maropu · 2020-05-29T07:46:10Z

docs/sql-ref-syntax-qry-select-hints.md


+### Partitioning Hints
+
+`COALESCE`/`REPARTITION`/`REPARTITION_BY_RANGE` hints have functionalities equivalent to those of the


How about rephrasing it like this?

Partitioning hints allow users to suggest a partitioning way that Spark should follow. COALESCE, REPARTITION, and REPARTITION_BY_RANGE hints are supported and they are equivalent to coalesce, repartition, and repartitionByRange Dataset APIs, respectively.

Also, could you add links to the Dataset APIs if we describe them here? https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Dataset

maropu · 2020-05-29T07:47:55Z

docs/sql-ref-syntax-qry-select-hints.md

+### Partitioning Hints
+
+`COALESCE`/`REPARTITION`/`REPARTITION_BY_RANGE` hints have functionalities equivalent to those of the
+`Dataset` `coalesce`/`repartition`/`repartitionByRange` APIs. The `COALESCE` hint can be used to reduce


How about moving the explanations for each hint (e.g., The COALESCE hint can be used to reduce...) into a new section like ### Partitiong Hints Types?

SparkQA · 2020-05-29T16:55:21Z

Test build #123294 has finished for PR 28672 at commit 8a7fa09.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-05-29T19:43:12Z

Test build #123299 has finished for PR 28672 at commit 7f97fe3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

xuanyuanking · 2020-05-30T00:26:13Z

docs/sql-ref-syntax-qry-select-hints.md

+and `REPARTITION_BY_RANGE` hints are supported and are equivalent to `coalesce`, `repartition`, and
+`repartitionByRange` [Dataset APIs](api/scala/org/apache/spark/sql/Dataset.html), respectively. These hints give users
+a way to tune performance and control the number of output files in Spark SQL. When multiple partitioning hints are
+specified, multiple nodes are inserted into the logical plan, but the leftmost hint is picked by the optimizer.


Nit: The description of multiple hints is duplicated in https://github.com/apache/spark/pull/28672/files#diff-84ec3ee2cc31db6fd14e15058e35435cR69, maybe we just keep the one with the example.

Thanks for your comment. I will keep the one in description.

xuanyuanking · 2020-05-30T00:27:44Z

docs/sql-ref-syntax-qry-select-hints.md

+a way to tune performance and control the number of output files in Spark SQL. When multiple partitioning hints are
+specified, multiple nodes are inserted into the logical plan, but the leftmost hint is picked by the optimizer.
+
+### Partitioning Hints Types


#### Partitioning Hints Types?

xuanyuanking · 2020-05-30T00:28:20Z

docs/sql-ref-syntax-qry-select-hints.md

+  The `REPARTITION_BY_RANGE` hint can be used to repartition to the specified number of partitions using the specified partitioning expressions. It takes column names and an optional partition number as parameters.
+
+
+### Examples


Ditto, #### Examples

maropu · 2020-05-30T00:44:28Z

docs/sql-ref-syntax-qry-select-hints.md

+
+  The `REPARTITION_BY_RANGE` hint can be used to repartition to the specified number of partitions using the specified partitioning expressions. It takes column names and an optional partition number as parameters.
+
+


nit: remove the unencessary blank line.

SparkQA · 2020-05-30T01:51:16Z

Test build #123305 has finished for PR 28672 at commit bc4fdfc.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu

Looks okay

xuanyuanking

LGTM

huaxingao · 2020-05-30T15:51:11Z

cc @srowen This is for 3.0. Thank you!

…E Hints to SQL Reference Add Coalesce/Repartition/Repartition_By_Range Hints to SQL Reference To make SQL reference complete <img width="1100" alt="Screen Shot 2020-05-29 at 6 46 38 PM" src="https://user-images.githubusercontent.com/13592258/83316782-d6fcf300-a1dc-11ea-87f6-e357b9c739fd.png"> <img width="1099" alt="Screen Shot 2020-05-29 at 6 43 30 PM" src="https://user-images.githubusercontent.com/13592258/83316784-d8c6b680-a1dc-11ea-95ea-10a1f75dcef9.png"> Only the the above pages are changed. The following two pages are the same as before. <img width="1100" alt="Screen Shot 2020-05-28 at 10 05 27 PM" src="https://user-images.githubusercontent.com/13592258/83223474-bfb3fc00-a12f-11ea-807a-824a618afa0b.png"> <img width="1099" alt="Screen Shot 2020-05-28 at 10 05 08 PM" src="https://user-images.githubusercontent.com/13592258/83223478-c2165600-a12f-11ea-806e-a1e57dc35ef4.png"> Manually build and check Closes #28672 from huaxingao/coalesce_hint. Authored-by: Huaxin Gao <[email protected]> Signed-off-by: Sean Owen <[email protected]> (cherry picked from commit 1b780f3) Signed-off-by: Sean Owen <[email protected]>

srowen · 2020-05-30T19:53:53Z

Merged to master/3.0. 3.0 had a very minor-looking merge conflict which I resolved directly.

huaxingao · 2020-05-30T20:22:14Z

Thanks! @srowen @maropu @xuanyuanking

[SPARK-31333][SQL][DOCS][FOLLOW-UP] Add Coalesce/Repartition/Repartit…

0b3e765

…ion_By_Range Hints to SQL REF

probot-autolabeler bot added the DOCS label May 29, 2020

maropu reviewed May 29, 2020

View reviewed changes

huaxingao changed the title ~~[SPARK-31333][SQL][DOCS][FOLLOW-UP] Add Coalesce/Repartition/Repartition_By_Range Hints to SQL Reference~~ [SPARK-31866][SQL][DOCS] Add Coalesce/Repartition/Repartition_By_Range Hints to SQL Reference May 29, 2020

address comments

60fdb93

maropu changed the title ~~[SPARK-31866][SQL][DOCS] Add Coalesce/Repartition/Repartition_By_Range Hints to SQL Reference~~ [SPARK-31866][SQL][DOCS] Add COALESCE/REPARTITION/REPARTITION_BY_RANGE Hints to SQL Reference May 29, 2020

maropu reviewed May 29, 2020

View reviewed changes

address comments

8a7fa09

rename sampling and window function file names

7f97fe3

xuanyuanking reviewed May 30, 2020

View reviewed changes

maropu reviewed May 30, 2020

View reviewed changes

address comments

bc4fdfc

maropu approved these changes May 30, 2020

View reviewed changes

xuanyuanking approved these changes May 30, 2020

View reviewed changes

srowen closed this in 1b780f3 May 30, 2020

huaxingao deleted the coalesce_hint branch May 30, 2020 20:22

maropu mentioned this pull request Jul 21, 2020

[SPARK-31753][SQL][DOCS] Add missing keywords in the SQL docs #29056

Closed


		### Coalesce/Repartition/Repartition_By_Range Hints

		Coalesce/Repartition/Repartition_By_Range hints have functionalities equivalent to those of the


		### Join Hints

		Join Hints allow users to suggest the join strategy that Spark should use. Prior to Spark 3.0, only the `BROADCAST` Join Hint was supported. `MERGE`, `SHUFFLE_HASH` and `SHUFFLE_REPLICATE_NL` Joint Hints support was added in 3.0. When different join strategy hints are specified on both sides of a join, Spark prioritizes hints in the following order: `BROADCAST` over `MERGE` over `SHUFFLE_HASH` over `SHUFFLE_REPLICATE_NL`. When both sides are specified with the `BROADCAST` hint or the `SHUFFLE_HASH` hint, Spark will pick the build side based on the join type and the sizes of the relations. Since a given strategy may not support all join types, Spark is not guaranteed to use the join strategy suggested by the hint.


		### Partitioning Hints

		`COALESCE`/`REPARTITION`/`REPARTITION_BY_RANGE` hints have functionalities equivalent to those of the

		The `REPARTITION_BY_RANGE` hint can be used to repartition to the specified number of partitions using the specified partitioning expressions. It takes column names and an optional partition number as parameters.


		### Examples

[SPARK-31866][SQL][DOCS] Add COALESCE/REPARTITION/REPARTITION_BY_RANGE Hints to SQL Reference #28672

[SPARK-31866][SQL][DOCS] Add COALESCE/REPARTITION/REPARTITION_BY_RANGE Hints to SQL Reference #28672

Uh oh!

Conversation

huaxingao commented May 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

huaxingao commented May 29, 2020

Uh oh!

SparkQA commented May 29, 2020

Uh oh!

maropu commented May 29, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

huaxingao commented May 29, 2020

Uh oh!

SparkQA commented May 29, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented May 29, 2020

Uh oh!

SparkQA commented May 29, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented May 30, 2020

Uh oh!

maropu left a comment

Choose a reason for hiding this comment

Uh oh!

xuanyuanking left a comment

Choose a reason for hiding this comment

Uh oh!

huaxingao commented May 30, 2020

Uh oh!

srowen commented May 30, 2020

Uh oh!

huaxingao commented May 30, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

huaxingao commented May 29, 2020 •

edited

Loading