-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-25656][SQL][DOC][EXAMPLE] Add a doc and examples about extra data source options #22801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ource save options
| OPTIONS ( | ||
| orc.bloom.filter.columns 'favorite_color', | ||
| orc.dictionary.key.threshold '1.0', | ||
| orc.column.encoding.direct 'name' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you review this, @gatorsmile ? This is the example we discussed previously.
|
Test build #97900 has finished for PR 22801 at commit
|
| .option("orc.bloom.filter.columns", "favorite_color") | ||
| .option("orc.dictionary.key.threshold", "1.0") | ||
| .option("orc.column.encoding.direct", "name") | ||
| .save("users_with_options.orc") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, cc @dbtsai .
This doc is only for Spark 3.0.0 since orc.column.encoding.direct is added to master branch.
|
Test build #97926 has finished for PR 22801 at commit
|
| # $example on:manual_save_options_orc$ | ||
| df <- read.df("examples/src/main/resources/users.orc", "orc") | ||
| write.orc(df, "users_with_options.orc", orc.bloom.filter.columns="favorite_color", orc.dictionary.key.threshold=1.0, orc.column.encoding.direct="name") | ||
| # $example off:manual_save_options_orc$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @felixcheung . Could you review this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should put space after param
(gosh same for csv example above)
orc.bloom.filter.columns = "favorite_color", orc.dictionary.key.threshold = 1.0, orc.column.encoding.direct = "name")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
| # $example on:manual_save_options_orc$ | ||
| df <- read.df("examples/src/main/resources/users.orc", "orc") | ||
| write.orc(df, "users_with_options.orc", orc.bloom.filter.columns="favorite_color", orc.dictionary.key.threshold=1.0, orc.column.encoding.direct="name") | ||
| # $example off:manual_save_options_orc$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should put space after param
(gosh same for csv example above)
orc.bloom.filter.columns = "favorite_color", orc.dictionary.key.threshold = 1.0, orc.column.encoding.direct = "name")
| (df.write.format("orc") | ||
| .option("orc.bloom.filter.columns", "favorite_color") | ||
| .option("orc.dictionary.key.threshold", "1.0") | ||
| .option("orc.column.encoding.direct", 'name') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use same quote? " or ' for name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep!
|
Test build #97932 has finished for PR 22801 at commit
|
|
Test build #97933 has finished for PR 22801 at commit
|
|
This LGTM. |
|
Thank you for review, @felixcheung , @dbtsai , @HyukjinKwon . |
…bout extra data source options ## What changes were proposed in this pull request? Our current doc does not explain how we are passing the data source specific options to the underlying data source. According to [the review comment](#22622 (comment)), this PR aims to add more detailed information and examples. This is a backport of #22801. `orc.column.encoding.direct` is removed since it's not supported in ORC 1.5.2. ## How was this patch tested? Manual. Closes #22839 from dongjoon-hyun/SPARK-25656-2.4. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…ata source options ## What changes were proposed in this pull request? Our current doc does not explain how we are passing the data source specific options to the underlying data source. According to [the review comment](apache#22622 (comment)), this PR aims to add more detailed information and examples ## How was this patch tested? Manual. Closes apache#22801 from dongjoon-hyun/SPARK-25656. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
Our current doc does not explain how we are passing the data source specific options to the underlying data source. According to the review comment, this PR aims to add more detailed information and examples
How was this patch tested?
Manual.