[SPARK-35025][SQL][PYTHON][DOCS] Move Parquet data source options from Python and Scala into a single page. #32161

itholic · 2021-04-14T06:40:22Z

What changes were proposed in this pull request?

This PR proposes move Parquet data source options from Python, Scala and Java into a single page.

Why are the changes needed?

So far, the documentation for Parquet data source options is separated into different pages for each language API documents. However, this makes managing many options inconvenient, so it is efficient to manage all options in a single page and provide a link to that page in the API of each language.

Does this PR introduce any user-facing change?

Yes, the documents will be shown below after this change:

"Parquet Files" page
Python
Scala
Java

How was this patch tested?

Manually build docs and confirm the page.

…into a single page.

itholic · 2021-04-14T06:42:24Z

After confirming & finishing this PR, I'll also move another data source options into the single page.

See SPARK-34494.

SparkQA · 2021-04-14T06:47:10Z

Test build #137333 has finished for PR 32161 at commit 432d8dd.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-04-14T07:48:16Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41913/

SparkQA · 2021-04-14T07:48:17Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41913/

…4491

github-actions · 2021-04-15T00:28:42Z

Test build #750227134 for PR 32161 at commit fc2e064.

SparkQA · 2021-04-15T00:44:14Z

Test build #137376 has finished for PR 32161 at commit fc2e064.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
public class VectorizedBLAS extends F2jBLAS
trait AnalysisOnlyCommand extends Command
case class RuleId(id: Int)
abstract class TreeNode[BaseType <: TreeNode[BaseType]] extends Product with TreePatternBits
trait TreePatternBits
implicit class MetadataColumnHelper(attr: Attribute)
case class WriteToDataSourceV2(
case class WriteToMicroBatchDataSource(

SparkQA · 2021-04-15T01:35:13Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41954/

SparkQA · 2021-04-15T01:35:14Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41954/

itholic · 2021-04-15T05:41:37Z

cc @HyukjinKwon

Could you please review this when you find some time?

python/pyspark/sql/readwriter.py

docs/sql-data-sources-parquet.md

sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala

HyukjinKwon · 2021-04-15T09:17:00Z

Python linter fails. Please take a look:

pycodestyle checks failed:
./python/pyspark/sql/readwriter.py:421:101: E501 line too long (106 > 100 characters)
./python/pyspark/sql/readwriter.py:1229:101: E501 line too long (107 > 100 characters)

python/pyspark/sql/readwriter.py

github-actions · 2021-04-16T07:02:42Z

Test build #754715131 for PR 32161 at commit b7aa8c7.

SparkQA · 2021-04-16T07:48:36Z

Test build #137472 has finished for PR 32161 at commit b7aa8c7.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-04-16T07:51:40Z

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42047/

HyukjinKwon · 2021-04-16T08:42:40Z

@itholic can you take a look for the style failure and sync your branch to the latest master branch?

python/pyspark/sql/readwriter.py

HyukjinKwon · 2021-04-16T08:48:48Z

Can you fix all instances like #32161 (comment)? Otherwise looks pretty good. @MaxGekk FYI

…4491

SparkQA · 2021-05-18T15:03:12Z

Test build #138679 has finished for PR 32161 at commit ad9f8a0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

docs/sql-data-sources-parquet.md

sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala

python/pyspark/sql/readwriter.py

SparkQA · 2021-05-20T09:47:37Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43262/

SparkQA · 2021-05-20T10:20:53Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43262/

sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala

sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala

SparkQA · 2021-05-20T12:58:02Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43276/

SparkQA · 2021-05-20T13:11:54Z

Test build #138739 has finished for PR 32161 at commit ffc124c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-05-20T13:31:41Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43276/

SparkQA · 2021-05-20T15:03:03Z

Test build #138754 has finished for PR 32161 at commit 2272717.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

python/pyspark/sql/streaming.py

python/pyspark/sql/readwriter.py

sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala

HyukjinKwon · 2021-05-21T01:15:18Z

Looks pretty good otherwise. @MaxGekk it would be great if you could have a change to take a quick look.

SparkQA · 2021-05-21T02:01:27Z

Test build #138779 has finished for PR 32161 at commit ead523d.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-05-21T05:48:29Z

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43310/

SparkQA · 2021-05-21T08:18:33Z

Test build #138787 has finished for PR 32161 at commit d6417a8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2021-05-25T11:10:03Z

docs/sql-data-sources-parquet.md

+  *  `DataFrameReader`
+  *  `DataFrameWriter`
+  *  `DataStreamReader`
+  *  `DataStreamWriter`


also mention:

* `OPTIONS` clause at [CREATE TABLE USING DATA_SOURCE](sql-ref-syntax-ddl-create-table-datasource.html)

[SPARK-35025] Move Parquet data source options from Python and Scala …

432d8dd

…into a single page.

github-actions bot added CORE DOCS PYTHON SQL labels Apr 14, 2021

Merge branch 'master' of https://github.com/apache/spark into SPARK-3…

fc2e064

…4491

HyukjinKwon reviewed Apr 15, 2021

View reviewed changes

python/pyspark/sql/readwriter.py Outdated Show resolved Hide resolved

HyukjinKwon reviewed Apr 15, 2021

View reviewed changes

docs/sql-data-sources-parquet.md Outdated Show resolved Hide resolved

HyukjinKwon reviewed Apr 15, 2021

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala Outdated Show resolved Hide resolved

HyukjinKwon reviewed Apr 15, 2021

View reviewed changes

python/pyspark/sql/readwriter.py Outdated Show resolved Hide resolved

Resolve comments

b7aa8c7

github-actions bot added the STRUCTURED STREAMING label Apr 16, 2021

HyukjinKwon reviewed Apr 16, 2021

View reviewed changes

python/pyspark/sql/readwriter.py Outdated Show resolved Hide resolved

itholic added 2 commits April 19, 2021 11:57

Merge branch 'master' of https://github.com/apache/spark into SPARK-3…

a538c41

…4491

Addressed comments

082d86d

HyukjinKwon reviewed May 20, 2021

View reviewed changes

docs/sql-data-sources-parquet.md Outdated Show resolved Hide resolved

HyukjinKwon reviewed May 20, 2021

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala Show resolved Hide resolved

HyukjinKwon reviewed May 20, 2021

View reviewed changes

python/pyspark/sql/readwriter.py Outdated Show resolved Hide resolved

itholic added 2 commits May 20, 2021 17:12

Resolved comments

45e0f8f

One more fix

ffc124c

HyukjinKwon reviewed May 20, 2021

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala Outdated Show resolved Hide resolved

HyukjinKwon reviewed May 20, 2021

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala Outdated Show resolved Hide resolved

itholic added 2 commits May 20, 2021 20:47

itemize the options

41ad66e

Resolved comments

2272717

HyukjinKwon reviewed May 21, 2021

View reviewed changes

python/pyspark/sql/streaming.py Outdated Show resolved Hide resolved

HyukjinKwon reviewed May 21, 2021

View reviewed changes

python/pyspark/sql/readwriter.py Outdated Show resolved Hide resolved

HyukjinKwon reviewed May 21, 2021

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala Outdated Show resolved Hide resolved

Resolved comments

ead523d

Add noqa

d6417a8

HyukjinKwon approved these changes May 21, 2021

View reviewed changes

itholic mentioned this pull request May 21, 2021

[SPARK-35395][DOCS] Move ORC data source options from Python and Scala into a single page #32546

Closed

HyukjinKwon closed this in d2bdd65 May 21, 2021

HyukjinKwon reviewed May 25, 2021

View reviewed changes

[SPARK-35025][SQL][PYTHON][DOCS] Move Parquet data source options from Python and Scala into a single page. #32161

[SPARK-35025][SQL][PYTHON][DOCS] Move Parquet data source options from Python and Scala into a single page. #32161

Uh oh!

Conversation

itholic commented Apr 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

itholic commented Apr 14, 2021

Uh oh!

SparkQA commented Apr 14, 2021

Uh oh!

SparkQA commented Apr 14, 2021

Uh oh!

SparkQA commented Apr 14, 2021

Uh oh!

github-actions bot commented Apr 15, 2021

Uh oh!

SparkQA commented Apr 15, 2021

Uh oh!

SparkQA commented Apr 15, 2021

Uh oh!

SparkQA commented Apr 15, 2021

Uh oh!

itholic commented Apr 15, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HyukjinKwon commented Apr 15, 2021

Uh oh!

Uh oh!

github-actions bot commented Apr 16, 2021

Uh oh!

SparkQA commented Apr 16, 2021

Uh oh!

SparkQA commented Apr 16, 2021

Uh oh!

HyukjinKwon commented Apr 16, 2021

Uh oh!

Uh oh!

HyukjinKwon commented Apr 16, 2021

Uh oh!

SparkQA commented May 18, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SparkQA commented May 20, 2021

Uh oh!

SparkQA commented May 20, 2021

Uh oh!

Uh oh!

Uh oh!

SparkQA commented May 20, 2021

Uh oh!

SparkQA commented May 20, 2021

Uh oh!

SparkQA commented May 20, 2021

Uh oh!

SparkQA commented May 20, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HyukjinKwon commented May 21, 2021

Uh oh!

SparkQA commented May 21, 2021

Uh oh!

SparkQA commented May 21, 2021

Uh oh!

SparkQA commented May 21, 2021

Uh oh!

HyukjinKwon May 25, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

itholic commented Apr 14, 2021 •

edited

Loading