Skip to content

Conversation

@HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Dec 20, 2019

What changes were proposed in this pull request?

This PR adds and exposes the options, 'recursiveFileLookup' and 'pathGlobFilter' in file sources 'mergeSchema' in ORC, into documentation.

Note that timeZone option was not moved from DataFrameReader.options as I assume it will likely affect other datasources as well once DSv2 is complete.

Why are the changes needed?

To document available options in sources properly.

Does this PR introduce any user-facing change?

In PySpark, pathGlobFilter can be set via DataFrameReader.(text|orc|parquet|json|csv) and DataStreamReader.(text|orc|parquet|json|csv).

How was this patch tested?

Manually built the doc and checked the output. Option setting in PySpark is rather a logical change. I manually tested one only:

$ ls -al tmp
...
-rw-r--r--   1 hyukjin.kwon  staff     3 Dec 20 12:19 aa
-rw-r--r--   1 hyukjin.kwon  staff     3 Dec 20 12:19 ab
-rw-r--r--   1 hyukjin.kwon  staff     3 Dec 20 12:19 ac
-rw-r--r--   1 hyukjin.kwon  staff     3 Dec 20 12:19 cc
>>> spark.read.text("tmp", pathGlobFilter="*c").show()
+-----+
|value|
+-----+
|   ac|
|   cc|
+-----+

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Dec 20, 2019

Let me just cc everybody involved in those PRs and JIRAs.

@nchammas, @gengliangwang, @WeichenXu123, @WangGuangxin, @dongjoon-hyun, @gatorsmile, @cloud-fan, @Ngone51, @mengxr

@HyukjinKwon HyukjinKwon changed the title [SPARK-30128][SPARK-27627][SPARK-27990][SPARK-11412][FOLLOW-UP] Document/promote 'recursiveFileLookup' and 'pathGlobFilter' in file sources 'mergeSchema' in ORC [SPARK-30128][SPARK-27627][SPARK-27990][SPARK-11412] Document/promote 'recursiveFileLookup' and 'pathGlobFilter' in file sources 'mergeSchema' in ORC Dec 20, 2019
@HyukjinKwon HyukjinKwon changed the title [SPARK-30128][SPARK-27627][SPARK-27990][SPARK-11412] Document/promote 'recursiveFileLookup' and 'pathGlobFilter' in file sources 'mergeSchema' in ORC [SPARK-30128] Document/promote 'recursiveFileLookup' and 'pathGlobFilter' in file sources 'mergeSchema' in ORC Dec 20, 2019
@HyukjinKwon HyukjinKwon changed the title [SPARK-30128] Document/promote 'recursiveFileLookup' and 'pathGlobFilter' in file sources 'mergeSchema' in ORC [SPARK-30128][DOCS] Document/promote 'recursiveFileLookup' and 'pathGlobFilter' in file sources 'mergeSchema' in ORC Dec 20, 2019
@HyukjinKwon HyukjinKwon changed the title [SPARK-30128][DOCS] Document/promote 'recursiveFileLookup' and 'pathGlobFilter' in file sources 'mergeSchema' in ORC [SPARK-30128][DOCS][PYTHON][SQL] Document/promote 'recursiveFileLookup' and 'pathGlobFilter' in file sources 'mergeSchema' in ORC Dec 20, 2019
@nchammas
Copy link
Contributor

I'm confused by the title of the PR. It looks like you're only working with pathGlobFilter. And didn't we address recursiveFileLookup and mergeSchema in #26718, #26730, and #26755?

*
* You can set the following ORC-specific option(s) for reading ORC files:
* <ul>
* <li>`mergeSchema` (default is the value specified in `spark.sql.orc.mergeSchema`): sets whether
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nchammas, I wanted to mention this in Scala side too in the PR title.

@SparkQA
Copy link

SparkQA commented Dec 20, 2019

Test build #115602 has finished for PR 26958 at commit dde76ab.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 20, 2019

Test build #115600 has finished for PR 26958 at commit d5ff1e4.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Dec 20, 2019

Test build #115615 has finished for PR 26958 at commit dde76ab.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

Let me merge this in few days if you guys are fine since it's just a doc change.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

@HyukjinKwon
Copy link
Member Author

Thanks all. Merged to master.

@HyukjinKwon HyukjinKwon deleted the doc-followup branch March 3, 2020 01:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants