Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

What changes were proposed in this pull request?

Our current doc does not explain how we are passing the data source specific options to the underlying data source. According to the review comment, this PR aims to add more detailed information and examples

How was this patch tested?

Manual.

OPTIONS (
orc.bloom.filter.columns 'favorite_color',
orc.dictionary.key.threshold '1.0',
orc.column.encoding.direct 'name'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you review this, @gatorsmile ? This is the example we discussed previously.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-25656][DOC][EXAMPLE] Add a doc and examples about extra data source options [SPARK-25656][SQL][DOC][EXAMPLE] Add a doc and examples about extra data source options Oct 23, 2018
@SparkQA
Copy link

SparkQA commented Oct 23, 2018

Test build #97900 has finished for PR 22801 at commit 1bd23a4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

.option("orc.bloom.filter.columns", "favorite_color")
.option("orc.dictionary.key.threshold", "1.0")
.option("orc.column.encoding.direct", "name")
.save("users_with_options.orc")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, cc @dbtsai .
This doc is only for Spark 3.0.0 since orc.column.encoding.direct is added to master branch.

@SparkQA
Copy link

SparkQA commented Oct 23, 2018

Test build #97926 has finished for PR 22801 at commit bf617d3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

# $example on:manual_save_options_orc$
df <- read.df("examples/src/main/resources/users.orc", "orc")
write.orc(df, "users_with_options.orc", orc.bloom.filter.columns="favorite_color", orc.dictionary.key.threshold=1.0, orc.column.encoding.direct="name")
# $example off:manual_save_options_orc$
Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Oct 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @felixcheung . Could you review this PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should put space after param
(gosh same for csv example above)

orc.bloom.filter.columns = "favorite_color", orc.dictionary.key.threshold = 1.0, orc.column.encoding.direct = "name")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

# $example on:manual_save_options_orc$
df <- read.df("examples/src/main/resources/users.orc", "orc")
write.orc(df, "users_with_options.orc", orc.bloom.filter.columns="favorite_color", orc.dictionary.key.threshold=1.0, orc.column.encoding.direct="name")
# $example off:manual_save_options_orc$
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should put space after param
(gosh same for csv example above)

orc.bloom.filter.columns = "favorite_color", orc.dictionary.key.threshold = 1.0, orc.column.encoding.direct = "name")

(df.write.format("orc")
.option("orc.bloom.filter.columns", "favorite_color")
.option("orc.dictionary.key.threshold", "1.0")
.option("orc.column.encoding.direct", 'name')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use same quote? " or ' for name?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep!

@SparkQA
Copy link

SparkQA commented Oct 23, 2018

Test build #97932 has finished for PR 22801 at commit dcdbb8b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 23, 2018

Test build #97933 has finished for PR 22801 at commit 2462e59.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dbtsai
Copy link
Member

dbtsai commented Oct 23, 2018

This LGTM.

@dongjoon-hyun
Copy link
Member Author

Thank you for review, @felixcheung , @dbtsai , @HyukjinKwon .
Merged to master.

@asfgit asfgit closed this in 4506dad Oct 23, 2018
@dongjoon-hyun dongjoon-hyun deleted the SPARK-25656 branch October 23, 2018 19:46
asfgit pushed a commit that referenced this pull request Oct 25, 2018
…bout extra data source options

## What changes were proposed in this pull request?

Our current doc does not explain how we are passing the data source specific options to the underlying data source. According to [the review comment](#22622 (comment)), this PR aims to add more detailed information and examples. This is a backport of #22801. `orc.column.encoding.direct` is removed since it's not supported in ORC 1.5.2.

## How was this patch tested?

Manual.

Closes #22839 from dongjoon-hyun/SPARK-25656-2.4.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…ata source options

## What changes were proposed in this pull request?

Our current doc does not explain how we are passing the data source specific options to the underlying data source. According to [the review comment](apache#22622 (comment)), this PR aims to add more detailed information and examples

## How was this patch tested?

Manual.

Closes apache#22801 from dongjoon-hyun/SPARK-25656.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants