Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Oct 22, 2018

What changes were proposed in this pull request?

This PR aims to fix the following SparkR example in Spark 2.3.0 ~ 2.4.0.

> df <- read.df("examples/src/main/resources/people.csv", "csv")
> namesAndAges <- select(df, "name", "age")
...
Caused by: org.apache.spark.sql.AnalysisException: cannot resolve '`name`' given input columns: [_c0];;
'Project ['name, 'age]
+- AnalysisBarrier
      +- Relation[_c0#97] csv

How was this patch tested?

Manual test in SparkR. (Please note that RSparkSQLExample.R fails at the last JDBC example)

> df <- read.df("examples/src/main/resources/people.csv", "csv", sep=";", inferSchema=T, header=T)
> namesAndAges <- select(df, "name", "age")

@SparkQA
Copy link

SparkQA commented Oct 22, 2018

Test build #97811 has finished for PR 22791 at commit f160711.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 22, 2018

Test build #97793 has finished for PR 22791 at commit f160711.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


# $example on:manual_load_options_csv$
df <- read.df("examples/src/main/resources/people.csv", "csv")
df <- read.df("examples/src/main/resources/people.csv", "csv", sep=";", inferSchema=T, header=T)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @felixcheung .
Could you review this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, ping @jomach and @gatorsmile because it was added by the following PR at Spark 2.3.

BTW, SPARK-20055 is still open.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in R style we typical put space after param name, ie. https://github.com/apache/spark/pull/22791/files#diff-eeffb959b904ebb5c864bc3dafe6437dR168
, sep = ";", inferSchema = TRUE, header = TRUE

and pls don't use T for readability

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @felixcheung .

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't mind, I included that here

@dongjoon-hyun
Copy link
Member Author

Thank you for review and merging, @srowen .
Merged to master/branch-2.4/branch-2.3.

asfgit pushed a commit that referenced this pull request Oct 22, 2018
## What changes were proposed in this pull request?

This PR aims to fix the following SparkR example in Spark 2.3.0 ~ 2.4.0.

```r
> df <- read.df("examples/src/main/resources/people.csv", "csv")
> namesAndAges <- select(df, "name", "age")
...
Caused by: org.apache.spark.sql.AnalysisException: cannot resolve '`name`' given input columns: [_c0];;
'Project ['name, 'age]
+- AnalysisBarrier
      +- Relation[_c0#97] csv
```

- https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc3-docs/_site/sql-programming-guide.html#manually-specifying-options
- http://spark.apache.org/docs/2.3.2/sql-programming-guide.html#manually-specifying-options
- http://spark.apache.org/docs/2.3.1/sql-programming-guide.html#manually-specifying-options
- http://spark.apache.org/docs/2.3.0/sql-programming-guide.html#manually-specifying-options

## How was this patch tested?

Manual test in SparkR. (Please note that `RSparkSQLExample.R` fails at the last JDBC example)

```r
> df <- read.df("examples/src/main/resources/people.csv", "csv", sep=";", inferSchema=T, header=T)
> namesAndAges <- select(df, "name", "age")
```

Closes #22791 from dongjoon-hyun/SPARK-25795.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 3b45567)
Signed-off-by: Dongjoon Hyun <[email protected]>
asfgit pushed a commit that referenced this pull request Oct 22, 2018
## What changes were proposed in this pull request?

This PR aims to fix the following SparkR example in Spark 2.3.0 ~ 2.4.0.

```r
> df <- read.df("examples/src/main/resources/people.csv", "csv")
> namesAndAges <- select(df, "name", "age")
...
Caused by: org.apache.spark.sql.AnalysisException: cannot resolve '`name`' given input columns: [_c0];;
'Project ['name, 'age]
+- AnalysisBarrier
      +- Relation[_c0#97] csv
```

- https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc3-docs/_site/sql-programming-guide.html#manually-specifying-options
- http://spark.apache.org/docs/2.3.2/sql-programming-guide.html#manually-specifying-options
- http://spark.apache.org/docs/2.3.1/sql-programming-guide.html#manually-specifying-options
- http://spark.apache.org/docs/2.3.0/sql-programming-guide.html#manually-specifying-options

## How was this patch tested?

Manual test in SparkR. (Please note that `RSparkSQLExample.R` fails at the last JDBC example)

```r
> df <- read.df("examples/src/main/resources/people.csv", "csv", sep=";", inferSchema=T, header=T)
> namesAndAges <- select(df, "name", "age")
```

Closes #22791 from dongjoon-hyun/SPARK-25795.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 3b45567)
Signed-off-by: Dongjoon Hyun <[email protected]>
@asfgit asfgit closed this in 3b45567 Oct 22, 2018
@dongjoon-hyun dongjoon-hyun deleted the SPARK-25795 branch October 22, 2018 23:45
Copy link
Member

@felixcheung felixcheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for fixing this!
minor comment


# $example on:manual_load_options_csv$
df <- read.df("examples/src/main/resources/people.csv", "csv")
df <- read.df("examples/src/main/resources/people.csv", "csv", sep=";", inferSchema=T, header=T)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in R style we typical put space after param name, ie. https://github.com/apache/spark/pull/22791/files#diff-eeffb959b904ebb5c864bc3dafe6437dR168
, sep = ";", inferSchema = TRUE, header = TRUE

and pls don't use T for readability

jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
## What changes were proposed in this pull request?

This PR aims to fix the following SparkR example in Spark 2.3.0 ~ 2.4.0.

```r
> df <- read.df("examples/src/main/resources/people.csv", "csv")
> namesAndAges <- select(df, "name", "age")
...
Caused by: org.apache.spark.sql.AnalysisException: cannot resolve '`name`' given input columns: [_c0];;
'Project ['name, 'age]
+- AnalysisBarrier
      +- Relation[_c0#97] csv
```

- https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc3-docs/_site/sql-programming-guide.html#manually-specifying-options
- http://spark.apache.org/docs/2.3.2/sql-programming-guide.html#manually-specifying-options
- http://spark.apache.org/docs/2.3.1/sql-programming-guide.html#manually-specifying-options
- http://spark.apache.org/docs/2.3.0/sql-programming-guide.html#manually-specifying-options

## How was this patch tested?

Manual test in SparkR. (Please note that `RSparkSQLExample.R` fails at the last JDBC example)

```r
> df <- read.df("examples/src/main/resources/people.csv", "csv", sep=";", inferSchema=T, header=T)
> namesAndAges <- select(df, "name", "age")
```

Closes apache#22791 from dongjoon-hyun/SPARK-25795.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants