Skip to content

Conversation

@itholic
Copy link
Contributor

@itholic itholic commented Apr 14, 2021

What changes were proposed in this pull request?

This PR proposes move Parquet data source options from Python, Scala and Java into a single page.

Why are the changes needed?

So far, the documentation for Parquet data source options is separated into different pages for each language API documents. However, this makes managing many options inconvenient, so it is efficient to manage all options in a single page and provide a link to that page in the API of each language.

Does this PR introduce any user-facing change?

Yes, the documents will be shown below after this change:

  • "Parquet Files" page
    Screen Shot 2021-05-21 at 1 35 08 PM

  • Python
    Screen Shot 2021-05-21 at 1 38 27 PM

  • Scala
    Screen Shot 2021-05-21 at 1 36 52 PM

  • Java
    Screen Shot 2021-05-21 at 1 37 19 PM

How was this patch tested?

Manually build docs and confirm the page.

@itholic
Copy link
Contributor Author

itholic commented Apr 14, 2021

After confirming & finishing this PR, I'll also move another data source options into the single page.

See SPARK-34494.

@SparkQA
Copy link

SparkQA commented Apr 14, 2021

Test build #137333 has finished for PR 32161 at commit 432d8dd.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 14, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41913/

@SparkQA
Copy link

SparkQA commented Apr 14, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41913/

@github-actions
Copy link

Test build #750227134 for PR 32161 at commit fc2e064.

@SparkQA
Copy link

SparkQA commented Apr 15, 2021

Test build #137376 has finished for PR 32161 at commit fc2e064.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • public class VectorizedBLAS extends F2jBLAS
  • trait AnalysisOnlyCommand extends Command
  • case class RuleId(id: Int)
  • abstract class TreeNode[BaseType <: TreeNode[BaseType]] extends Product with TreePatternBits
  • trait TreePatternBits
  • implicit class MetadataColumnHelper(attr: Attribute)
  • case class WriteToDataSourceV2(
  • case class WriteToMicroBatchDataSource(

@SparkQA
Copy link

SparkQA commented Apr 15, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41954/

@SparkQA
Copy link

SparkQA commented Apr 15, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41954/

@itholic
Copy link
Contributor Author

itholic commented Apr 15, 2021

cc @HyukjinKwon

Could you please review this when you find some time?

@HyukjinKwon
Copy link
Member

Python linter fails. Please take a look:

pycodestyle checks failed:
./python/pyspark/sql/readwriter.py:421:101: E501 line too long (106 > 100 characters)
./python/pyspark/sql/readwriter.py:1229:101: E501 line too long (107 > 100 characters)

@github-actions
Copy link

Test build #754715131 for PR 32161 at commit b7aa8c7.

@SparkQA
Copy link

SparkQA commented Apr 16, 2021

Test build #137472 has finished for PR 32161 at commit b7aa8c7.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 16, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42047/

@HyukjinKwon
Copy link
Member

@itholic can you take a look for the style failure and sync your branch to the latest master branch?

@HyukjinKwon
Copy link
Member

Can you fix all instances like #32161 (comment)? Otherwise looks pretty good. @MaxGekk FYI

@SparkQA
Copy link

SparkQA commented May 18, 2021

Test build #138679 has finished for PR 32161 at commit ad9f8a0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 20, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43262/

@SparkQA
Copy link

SparkQA commented May 20, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43262/

@SparkQA
Copy link

SparkQA commented May 20, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43276/

@SparkQA
Copy link

SparkQA commented May 20, 2021

Test build #138739 has finished for PR 32161 at commit ffc124c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 20, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43276/

@SparkQA
Copy link

SparkQA commented May 20, 2021

Test build #138754 has finished for PR 32161 at commit 2272717.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Looks pretty good otherwise. @MaxGekk it would be great if you could have a change to take a quick look.

@SparkQA
Copy link

SparkQA commented May 21, 2021

Test build #138779 has finished for PR 32161 at commit ead523d.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 21, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43310/

@SparkQA
Copy link

SparkQA commented May 21, 2021

Test build #138787 has finished for PR 32161 at commit d6417a8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

* `DataFrameReader`
* `DataFrameWriter`
* `DataStreamReader`
* `DataStreamWriter`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also mention:

* `OPTIONS` clause at [CREATE TABLE USING DATA_SOURCE](sql-ref-syntax-ddl-create-table-datasource.html)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants