-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-17967][SQL] Support for array as an option in SQL parser #20125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @gatorsmile could you take a look please? |
|
Is this a special feature for SparkSQL only? Seems Hive doesn't have such support. Btw, is this any difference than using string? Like: CREATE TEMPORARY TABLE tableA USING csv
OPTIONS (nullValue "[2012, 1.1, 'null']", ...) |
|
Yup, I was thinking of SparkSQL only feature. For more details, the original intention was to support multiple values for > d <- "col1,col2
+ 1,3
+ 2,4"
> df <- read.csv(text=d, na.strings=c("3", "2"))
> dfFor more context, original proposal (Scala/SQL/Python/Java) here - #16611 touched many files and I received an advice to make this smaller, which I liked. |
Nope, they will be the same but I was thinking this is a simplest fix. |
|
I actually think #20125 (comment) are good points and I was hesitant about it. Although IMHO I think it might be fine in a way but let me cc @hvanhovell and @rxin too, who reviewed my related PRs before. |
|
Test build #85555 has finished for PR 20125 at commit
|
|
I am sorry it's been inactive. Let me update this one within a week. |
|
Two general questions:
|
|
1.. Any behavior change in the parser? I believe there's no behaviour changes since option clause itself does not support Currently, option value takes, string`, integer, decimals, and bools. I believe it's not ambiguous or it doesn't introduce a behaviour change in our parser. 2.. What is the corresponding interface for DataFrameReader and DataFrameWriter APIs? I wsa thinking about the interfaces as below: Scala - Java - Previous PR includes that APIs https://github.com/apache/spark/pull/16611/files One concern is that: could work in the same way .. which is a bit ugly. |
|
I was thinking about @cloud-fan's suggestion at #21192 (comment), and it looks feasible that we can additionally add an util that parses JSON array string to return an array but I wonder if that's possible in Data Source V1. We can do that However, I don't think it's quite possible to Data Source V1, since we can't change the signatures from Anyway I think we still need this change to follow @cloud-fan's suggestion. |
What changes were proposed in this pull request?
This PR targets to add the ability for dealing with an array (JSON array) in
tablePropertyValuerule.SQL
How was this patch tested?
Manually tested and test cases added in
DDLParserSuite.scala.