Skip to content

Comments

[SPARK-30881][SQL][DOCS]Revise the doc of spark.sql.sources.parallelPartitionDiscovery.threshold#27639

Closed
gengliangwang wants to merge 1 commit intoapache:masterfrom
gengliangwang:reviseParallelPartitionDiscovery
Closed

[SPARK-30881][SQL][DOCS]Revise the doc of spark.sql.sources.parallelPartitionDiscovery.threshold#27639
gengliangwang wants to merge 1 commit intoapache:masterfrom
gengliangwang:reviseParallelPartitionDiscovery

Conversation

@gengliangwang
Copy link
Member

What changes were proposed in this pull request?

Revise the doc of SQL configuration spark.sql.sources.parallelPartitionDiscovery.threshold.

Why are the changes needed?

The doc of configuration "spark.sql.sources.parallelPartitionDiscovery.threshold" is not accurate on the part "This applies to Parquet, ORC, CSV, JSON and LibSVM data sources".

We should revise it as effective on all the file-based data sources.

Does this PR introduce any user-facing change?

No

How was this patch tested?

None. It's just doc.

"files with another Spark distributed job. This applies to Parquet, ORC, CSV, JSON and " +
"LibSVM data sources.")
"files with another Spark distributed job. This configuration is effective only when " +
"using file-based sources such as Parquet, JSON and ORC.")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This follows the doc of spark.sql.files.maxPartitionBytes and spark.sql.files.openCostInBytes

@SparkQA
Copy link

SparkQA commented Feb 19, 2020

Test build #118685 has finished for PR 27639 at commit 4adae15.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang
Copy link
Member Author

retest this please.

@SparkQA
Copy link

SparkQA commented Feb 19, 2020

Test build #118686 has finished for PR 27639 at commit 4adae15.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu maropu changed the title [SPARK-30881][SQL][Doc]Revise the doc of spark.sql.sources.parallelPartitionDiscovery.threshold [SPARK-30881][SQL][DOCS]Revise the doc of spark.sql.sources.parallelPartitionDiscovery.threshold Feb 19, 2020
@maropu
Copy link
Member

maropu commented Feb 19, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Feb 19, 2020

Test build #118687 has finished for PR 27639 at commit 4adae15.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang
Copy link
Member Author

@maropu I think the Mima test failure is from #27395

@maropu
Copy link
Member

maropu commented Feb 20, 2020

Ah, I see...

@gengliangwang
Copy link
Member Author

retest this please.

@SparkQA
Copy link

SparkQA commented Feb 20, 2020

Test build #118696 has finished for PR 27639 at commit 4adae15.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@gengliangwang
Copy link
Member Author

Thanks for the review.
Merging to master/3.0

gengliangwang added a commit that referenced this pull request Feb 20, 2020
…PartitionDiscovery.threshold

### What changes were proposed in this pull request?

Revise the doc of SQL configuration `spark.sql.sources.parallelPartitionDiscovery.threshold`.
### Why are the changes needed?

The doc of configuration "spark.sql.sources.parallelPartitionDiscovery.threshold" is not accurate on the part "This applies to Parquet, ORC, CSV, JSON and LibSVM data sources".

We should revise it as effective on all the file-based data sources.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

None. It's just doc.

Closes #27639 from gengliangwang/reviseParallelPartitionDiscovery.

Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>
(cherry picked from commit 92d5d40)
Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>
@SparkQA
Copy link

SparkQA commented Feb 20, 2020

Test build #118705 has finished for PR 27639 at commit 4adae15.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
…PartitionDiscovery.threshold

### What changes were proposed in this pull request?

Revise the doc of SQL configuration `spark.sql.sources.parallelPartitionDiscovery.threshold`.
### Why are the changes needed?

The doc of configuration "spark.sql.sources.parallelPartitionDiscovery.threshold" is not accurate on the part "This applies to Parquet, ORC, CSV, JSON and LibSVM data sources".

We should revise it as effective on all the file-based data sources.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

None. It's just doc.

Closes apache#27639 from gengliangwang/reviseParallelPartitionDiscovery.

Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants