[SPARK-17967][SQL] Support for array as an option in SQL parser #20125

HyukjinKwon · 2017-12-31T10:58:12Z

What changes were proposed in this pull request?

This PR targets to add the ability for dealing with an array (JSON array) in tablePropertyValue rule.

SQL

CREATE TEMPORARY TABLE tableA USING csv
OPTIONS (nullValue [2012, 1.1, 'null'], ...)

How was this patch tested?

Manually tested and test cases added in DDLParserSuite.scala.

HyukjinKwon · 2017-12-31T10:58:45Z

cc @gatorsmile could you take a look please?

viirya · 2017-12-31T13:48:21Z

Is this a special feature for SparkSQL only? Seems Hive doesn't have such support.

Btw, is this any difference than using string? Like:

CREATE TEMPORARY TABLE tableA USING csv
OPTIONS (nullValue "[2012, 1.1, 'null']", ...)

HyukjinKwon · 2017-12-31T13:55:08Z

Yup, I was thinking of SparkSQL only feature.

For more details, the original intention was to support multiple values for nullValue but I realised such option support can be generallised - there have been several issues about this since CSV was thirdparty library (I will find and give some links if requested). Also, there is one reference in R too:

> d <- "col1,col2
+ 1,3
+ 2,4"
> df <- read.csv(text=d, na.strings=c("3", "2"))
> df

  col1 col2
1    1   NA
2   NA    4

For more context, original proposal (Scala/SQL/Python/Java) here - #16611 touched many files and I received an advice to make this smaller, which I liked.

HyukjinKwon · 2017-12-31T13:57:23Z

Btw, is this any difference than using string? Like:

Nope, they will be the same but I was thinking this is a simplest fix.

HyukjinKwon · 2017-12-31T14:03:34Z

I actually think #20125 (comment) are good points and I was hesitant about it. Although IMHO I think it might be fine in a way but let me cc @hvanhovell and @rxin too, who reviewed my related PRs before.

SparkQA · 2017-12-31T14:09:05Z

Test build #85555 has finished for PR 20125 at commit 5cae64b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2018-10-12T02:27:49Z

I am sorry it's been inactive. Let me update this one within a week.

gatorsmile · 2018-12-20T20:33:47Z

Two general questions:

Any behavior change in the parser?
What is the corresponding interface for DataFrameReader and DataFrameWriter APIs?

HyukjinKwon · 2018-12-21T00:24:21Z

1.. Any behavior change in the parser?

I believe there's no behaviour changes since option clause itself does not support [ and ] tokens:

CREATE TEMPORARY TABLE tableA USING csv
OPTIONS (nullValue [2012, 1.1, 'null'], ...)

Currently, option value takes, string`, integer, decimals, and bools. I believe it's not ambiguous or it doesn't introduce a behaviour change in our parser.

2.. What is the corresponding interface for DataFrameReader and DataFrameWriter APIs?

I wsa thinking about the interfaces as below:

Scala - Seq[String]

spark.read.format("csv")
  .option("nullValue", Seq("2012", "Tesla", "null"))
  ...

Java - String[]

spark.read().format("csv")
  .option("nullValue", new String[]{"", "null", "NA"})
  ...

Previous PR includes that APIs https://github.com/apache/spark/pull/16611/files

One concern is that:

OPTIONS (nullValue "[2012, 1.1, 'null']", ...)

option("[2012, 1.1, 'null']")

could work in the same way .. which is a bit ugly.

HyukjinKwon · 2018-12-21T01:28:48Z

I was thinking about @cloud-fan's suggestion at #21192 (comment), and it looks feasible that we can additionally add an util that parses JSON array string to return an array but I wonder if that's possible in Data Source V1.

We can do that DataSourceOptions.getArrays for Data source V2. DataSourceOptions.paths uses a similar approach.

However, I don't think it's quite possible to Data Source V1, since we can't change the signatures from Map[String, String] to another.

Anyway I think we still need this change to follow @cloud-fan's suggestion.

Support for array as an option in SQL

5cae64b

MaxGekk mentioned this pull request May 1, 2018

[SPARK-24118][SQL] Flexible format for the lineSep option of Text and JSON datasources #21192

Closed

dongjoon-hyun added the SQL label Jun 14, 2019

HyukjinKwon closed this Jun 24, 2019

HyukjinKwon deleted the SPARK-17967-sql branch March 3, 2020 01:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-17967][SQL] Support for array as an option in SQL parser #20125

[SPARK-17967][SQL] Support for array as an option in SQL parser #20125

Uh oh!

HyukjinKwon commented Dec 31, 2017

Uh oh!

HyukjinKwon commented Dec 31, 2017

Uh oh!

viirya commented Dec 31, 2017 •

edited

Loading

Uh oh!

HyukjinKwon commented Dec 31, 2017 •

edited

Loading

Uh oh!

HyukjinKwon commented Dec 31, 2017

Uh oh!

HyukjinKwon commented Dec 31, 2017 •

edited

Loading

Uh oh!

SparkQA commented Dec 31, 2017

Uh oh!

HyukjinKwon commented Oct 12, 2018

Uh oh!

gatorsmile commented Dec 20, 2018

Uh oh!

HyukjinKwon commented Dec 21, 2018

Uh oh!

HyukjinKwon commented Dec 21, 2018 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[SPARK-17967][SQL] Support for array as an option in SQL parser #20125

[SPARK-17967][SQL] Support for array as an option in SQL parser #20125

Uh oh!

Conversation

HyukjinKwon commented Dec 31, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

HyukjinKwon commented Dec 31, 2017

Uh oh!

viirya commented Dec 31, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HyukjinKwon commented Dec 31, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HyukjinKwon commented Dec 31, 2017

Uh oh!

HyukjinKwon commented Dec 31, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Dec 31, 2017

Uh oh!

HyukjinKwon commented Oct 12, 2018

Uh oh!

gatorsmile commented Dec 20, 2018

Uh oh!

HyukjinKwon commented Dec 21, 2018

Uh oh!

HyukjinKwon commented Dec 21, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

viirya commented Dec 31, 2017 •

edited

Loading

HyukjinKwon commented Dec 31, 2017 •

edited

Loading

HyukjinKwon commented Dec 31, 2017 •

edited

Loading

HyukjinKwon commented Dec 21, 2018 •

edited

Loading