[SPARK-25795][R][EXAMPLE] Fix CSV SparkR SQL Example #22791

dongjoon-hyun · 2018-10-22T04:15:41Z

What changes were proposed in this pull request?

This PR aims to fix the following SparkR example in Spark 2.3.0 ~ 2.4.0.

> df <- read.df("examples/src/main/resources/people.csv", "csv")
> namesAndAges <- select(df, "name", "age")
...
Caused by: org.apache.spark.sql.AnalysisException: cannot resolve '`name`' given input columns: [_c0];;
'Project ['name, 'age]
+- AnalysisBarrier
      +- Relation[_c0#97] csv

How was this patch tested?

Manual test in SparkR. (Please note that RSparkSQLExample.R fails at the last JDBC example)

> df <- read.df("examples/src/main/resources/people.csv", "csv", sep=";", inferSchema=T, header=T)
> namesAndAges <- select(df, "name", "age")

SparkQA · 2018-10-22T14:10:40Z

Test build #97811 has finished for PR 22791 at commit f160711.

This patch passes all tests.
This patch does not merge cleanly.
This patch adds no public classes.

SparkQA · 2018-10-22T14:19:49Z

Test build #97793 has finished for PR 22791 at commit f160711.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2018-10-22T15:01:18Z

examples/src/main/r/RSparkSQLExample.R


 # $example on:manual_load_options_csv$
-df <- read.df("examples/src/main/resources/people.csv", "csv")
+df <- read.df("examples/src/main/resources/people.csv", "csv", sep=";", inferSchema=T, header=T)


Hi, @felixcheung .
Could you review this?

Also, ping @jomach and @gatorsmile because it was added by the following PR at Spark 2.3.

https://github.com/apache/spark/pull/19429/files#diff-eeffb959b904ebb5c864bc3dafe6437dR117

BTW, SPARK-20055 is still open.

in R style we typical put space after param name, ie. https://github.com/apache/spark/pull/22791/files#diff-eeffb959b904ebb5c864bc3dafe6437dR168
, sep = ";", inferSchema = TRUE, header = TRUE

and pls don't use T for readability

Thank you, @felixcheung .

If you don't mind, I included that here

dongjoon-hyun · 2018-10-22T23:33:56Z

Thank you for review and merging, @srowen .
Merged to master/branch-2.4/branch-2.3.

## What changes were proposed in this pull request? This PR aims to fix the following SparkR example in Spark 2.3.0 ~ 2.4.0. ```r > df <- read.df("examples/src/main/resources/people.csv", "csv") > namesAndAges <- select(df, "name", "age") ... Caused by: org.apache.spark.sql.AnalysisException: cannot resolve '`name`' given input columns: [_c0];; 'Project ['name, 'age] +- AnalysisBarrier +- Relation[_c0#97] csv ``` - https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc3-docs/_site/sql-programming-guide.html#manually-specifying-options - http://spark.apache.org/docs/2.3.2/sql-programming-guide.html#manually-specifying-options - http://spark.apache.org/docs/2.3.1/sql-programming-guide.html#manually-specifying-options - http://spark.apache.org/docs/2.3.0/sql-programming-guide.html#manually-specifying-options ## How was this patch tested? Manual test in SparkR. (Please note that `RSparkSQLExample.R` fails at the last JDBC example) ```r > df <- read.df("examples/src/main/resources/people.csv", "csv", sep=";", inferSchema=T, header=T) > namesAndAges <- select(df, "name", "age") ``` Closes #22791 from dongjoon-hyun/SPARK-25795. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 3b45567) Signed-off-by: Dongjoon Hyun <[email protected]>

felixcheung

thanks for fixing this!
minor comment

felixcheung · 2018-10-23T16:15:23Z

examples/src/main/r/RSparkSQLExample.R


 # $example on:manual_load_options_csv$
-df <- read.df("examples/src/main/resources/people.csv", "csv")
+df <- read.df("examples/src/main/resources/people.csv", "csv", sep=";", inferSchema=T, header=T)


in R style we typical put space after param name, ie. https://github.com/apache/spark/pull/22791/files#diff-eeffb959b904ebb5c864bc3dafe6437dR168
, sep = ";", inferSchema = TRUE, header = TRUE

and pls don't use T for readability

## What changes were proposed in this pull request? This PR aims to fix the following SparkR example in Spark 2.3.0 ~ 2.4.0. ```r > df <- read.df("examples/src/main/resources/people.csv", "csv") > namesAndAges <- select(df, "name", "age") ... Caused by: org.apache.spark.sql.AnalysisException: cannot resolve '`name`' given input columns: [_c0];; 'Project ['name, 'age] +- AnalysisBarrier +- Relation[_c0#97] csv ``` - https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc3-docs/_site/sql-programming-guide.html#manually-specifying-options - http://spark.apache.org/docs/2.3.2/sql-programming-guide.html#manually-specifying-options - http://spark.apache.org/docs/2.3.1/sql-programming-guide.html#manually-specifying-options - http://spark.apache.org/docs/2.3.0/sql-programming-guide.html#manually-specifying-options ## How was this patch tested? Manual test in SparkR. (Please note that `RSparkSQLExample.R` fails at the last JDBC example) ```r > df <- read.df("examples/src/main/resources/people.csv", "csv", sep=";", inferSchema=T, header=T) > namesAndAges <- select(df, "name", "age") ``` Closes apache#22791 from dongjoon-hyun/SPARK-25795. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

[SPARK-25795][R][EXAMPLE] Fix CSV SparkR SQL Example

f160711

dongjoon-hyun commented Oct 22, 2018

View reviewed changes

srowen approved these changes Oct 22, 2018

View reviewed changes

asfgit closed this in 3b45567 Oct 22, 2018

dongjoon-hyun deleted the SPARK-25795 branch October 22, 2018 23:45

felixcheung reviewed Oct 23, 2018

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-25795][R][EXAMPLE] Fix CSV SparkR SQL Example #22791

[SPARK-25795][R][EXAMPLE] Fix CSV SparkR SQL Example #22791

Uh oh!

dongjoon-hyun commented Oct 22, 2018 •

edited

Loading

Uh oh!

SparkQA commented Oct 22, 2018

Uh oh!

SparkQA commented Oct 22, 2018

Uh oh!

dongjoon-hyun Oct 22, 2018

Uh oh!

dongjoon-hyun Oct 22, 2018

Uh oh!

felixcheung Oct 23, 2018

Uh oh!

dongjoon-hyun Oct 23, 2018

Uh oh!

dongjoon-hyun Oct 23, 2018

Uh oh!

dongjoon-hyun commented Oct 22, 2018

Uh oh!

felixcheung left a comment

Uh oh!

felixcheung Oct 23, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-25795][R][EXAMPLE] Fix CSV SparkR SQL Example #22791

[SPARK-25795][R][EXAMPLE] Fix CSV SparkR SQL Example #22791

Uh oh!

Conversation

dongjoon-hyun commented Oct 22, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Oct 22, 2018

Uh oh!

SparkQA commented Oct 22, 2018

Uh oh!

dongjoon-hyun Oct 22, 2018

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Oct 22, 2018

Choose a reason for hiding this comment

Uh oh!

felixcheung Oct 23, 2018

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Oct 23, 2018

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Oct 23, 2018

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Oct 22, 2018

Uh oh!

felixcheung left a comment

Choose a reason for hiding this comment

Uh oh!

felixcheung Oct 23, 2018

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dongjoon-hyun commented Oct 22, 2018 •

edited

Loading