-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-30128][DOCS][PYTHON][SQL] Document/promote 'recursiveFileLookup' and 'pathGlobFilter' in file sources 'mergeSchema' in ORC #26958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…mergeSchema' in ORC
|
Let me just cc everybody involved in those PRs and JIRAs. @nchammas, @gengliangwang, @WeichenXu123, @WangGuangxin, @dongjoon-hyun, @gatorsmile, @cloud-fan, @Ngone51, @mengxr |
| * | ||
| * You can set the following ORC-specific option(s) for reading ORC files: | ||
| * <ul> | ||
| * <li>`mergeSchema` (default is the value specified in `spark.sql.orc.mergeSchema`): sets whether |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nchammas, I wanted to mention this in Scala side too in the PR title.
|
Test build #115602 has finished for PR 26958 at commit
|
|
Test build #115600 has finished for PR 26958 at commit
|
|
retest this please |
|
Test build #115615 has finished for PR 26958 at commit
|
|
Let me merge this in few days if you guys are fine since it's just a doc change. |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
|
Thanks all. Merged to master. |
What changes were proposed in this pull request?
This PR adds and exposes the options, 'recursiveFileLookup' and 'pathGlobFilter' in file sources 'mergeSchema' in ORC, into documentation.
recursiveFileLookupat file sources: [SPARK-27990][SQL][ML] Provide a way to recursively load data from datasource #24830 (SPARK-27627)pathGlobFilterat file sources: [SPARK-27627][SQL] Make option "pathGlobFilter" as a general option for all file sources #24518 (SPARK-27990)mergeSchemaat ORC: [SPARK-11412][SQL] Support merge schema for ORC #24043 (SPARK-11412)Note that
timeZoneoption was not moved fromDataFrameReader.optionsas I assume it will likely affect other datasources as well once DSv2 is complete.Why are the changes needed?
To document available options in sources properly.
Does this PR introduce any user-facing change?
In PySpark,
pathGlobFiltercan be set viaDataFrameReader.(text|orc|parquet|json|csv)andDataStreamReader.(text|orc|parquet|json|csv).How was this patch tested?
Manually built the doc and checked the output. Option setting in PySpark is rather a logical change. I manually tested one only: