[SPARK-30881][SQL][DOCS]Revise the doc of spark.sql.sources.parallelPartitionDiscovery.threshold by gengliangwang · Pull Request #27639 · apache/spark

gengliangwang · 2020-02-19T21:43:48Z

What changes were proposed in this pull request?

Revise the doc of SQL configuration spark.sql.sources.parallelPartitionDiscovery.threshold.

Why are the changes needed?

The doc of configuration "spark.sql.sources.parallelPartitionDiscovery.threshold" is not accurate on the part "This applies to Parquet, ORC, CSV, JSON and LibSVM data sources".

We should revise it as effective on all the file-based data sources.

Does this PR introduce any user-facing change?

No

How was this patch tested?

None. It's just doc.

gengliangwang · 2020-02-19T21:46:03Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

-        "files with another Spark distributed job. This applies to Parquet, ORC, CSV, JSON and " +
-        "LibSVM data sources.")
+        "files with another Spark distributed job. This configuration is effective only when " +
+        "using file-based sources such as Parquet, JSON and ORC.")


This follows the doc of spark.sql.files.maxPartitionBytes and spark.sql.files.openCostInBytes

SparkQA · 2020-02-19T21:55:00Z

Test build #118685 has finished for PR 27639 at commit 4adae15.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

gengliangwang · 2020-02-19T21:56:01Z

retest this please.

SparkQA · 2020-02-19T22:10:25Z

Test build #118686 has finished for PR 27639 at commit 4adae15.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2020-02-19T23:26:23Z

retest this please

SparkQA · 2020-02-19T23:42:18Z

Test build #118687 has finished for PR 27639 at commit 4adae15.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

gengliangwang · 2020-02-19T23:48:35Z

@maropu I think the Mima test failure is from #27395

maropu · 2020-02-20T00:17:49Z

Ah, I see...

gengliangwang · 2020-02-20T04:40:47Z

retest this please.

SparkQA · 2020-02-20T08:05:02Z

Test build #118696 has finished for PR 27639 at commit 4adae15.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-02-20T08:16:16Z

retest this please

gengliangwang · 2020-02-20T08:58:37Z

Thanks for the review.
Merging to master/3.0

…PartitionDiscovery.threshold ### What changes were proposed in this pull request? Revise the doc of SQL configuration `spark.sql.sources.parallelPartitionDiscovery.threshold`. ### Why are the changes needed? The doc of configuration "spark.sql.sources.parallelPartitionDiscovery.threshold" is not accurate on the part "This applies to Parquet, ORC, CSV, JSON and LibSVM data sources". We should revise it as effective on all the file-based data sources. ### Does this PR introduce any user-facing change? No ### How was this patch tested? None. It's just doc. Closes #27639 from gengliangwang/reviseParallelPartitionDiscovery. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com> (cherry picked from commit 92d5d40) Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>

SparkQA · 2020-02-20T13:13:41Z

Test build #118705 has finished for PR 27639 at commit 4adae15.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

…PartitionDiscovery.threshold ### What changes were proposed in this pull request? Revise the doc of SQL configuration `spark.sql.sources.parallelPartitionDiscovery.threshold`. ### Why are the changes needed? The doc of configuration "spark.sql.sources.parallelPartitionDiscovery.threshold" is not accurate on the part "This applies to Parquet, ORC, CSV, JSON and LibSVM data sources". We should revise it as effective on all the file-based data sources. ### Does this PR introduce any user-facing change? No ### How was this patch tested? None. It's just doc. Closes apache#27639 from gengliangwang/reviseParallelPartitionDiscovery. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>

revise the doc of spark.sql.sources.parallelPartitionDiscovery.threshold

4adae15

gengliangwang requested a review from cloud-fan February 19, 2020 21:44

gengliangwang commented Feb 19, 2020

View reviewed changes

maropu changed the title ~~[SPARK-30881][SQL][Doc]Revise the doc of spark.sql.sources.parallelPartitionDiscovery.threshold~~ [SPARK-30881][SQL][DOCS]Revise the doc of spark.sql.sources.parallelPartitionDiscovery.threshold Feb 19, 2020

cloud-fan approved these changes Feb 20, 2020

View reviewed changes

maropu approved these changes Feb 20, 2020

View reviewed changes

gengliangwang closed this in 92d5d40 Feb 20, 2020

Comments

Conversation

gengliangwang commented Feb 19, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gengliangwang Feb 19, 2020

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Feb 19, 2020

Uh oh!

gengliangwang commented Feb 19, 2020

Uh oh!

SparkQA commented Feb 19, 2020

Uh oh!

maropu commented Feb 19, 2020

Uh oh!

SparkQA commented Feb 19, 2020

Uh oh!

gengliangwang commented Feb 19, 2020

Uh oh!

maropu commented Feb 20, 2020

Uh oh!

gengliangwang commented Feb 20, 2020

Uh oh!

SparkQA commented Feb 20, 2020

Uh oh!

cloud-fan commented Feb 20, 2020

Uh oh!

gengliangwang commented Feb 20, 2020

Uh oh!

SparkQA commented Feb 20, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants