-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-38335][SQL] Implement parser support for DEFAULT column values #35690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Show resolved
Hide resolved
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala
Outdated
Show resolved
Hide resolved
gengliangwang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except minor comments
|
Can one of the admins verify this patch? |
|
jenkins merge |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Outdated
Show resolved
Hide resolved
|
Supporting default column values is very common among DBMS. However, this will be a breaking change for Spark SQL After supporting default column value: I am +1 with the change. |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for pinging me, @gengliangwang .
|
cc @aokolnychyi , @RussellSpitzer , @rdblue , too. |
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
Show resolved
Hide resolved
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala
Outdated
Show resolved
Hide resolved
viirya
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work.
For the parser change itself, looks okay. As this is a breaking change, I'd like to see some clarification on why this is necessary to have. What issue we have if we don't have this (because we don't have the default value for long time) and do we have any workaround now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR doesn't seem to have the full body yet, what is your release target for this, @dtenedor and @gengliangwang ? I'm curious about the general error handling.
- Creating
NULLdefault value forNOT NULLcolumn - Type mismatch between
default value literaland column type. - Upcasting or not in case of type mismatch
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
Show resolved
Hide resolved
Good questions, I replied above earlier. We can perform a type coercion from the provided type to the required type, or return an error if the types are not coercible in this way. We can use existing type coercion rules in the analyzer for this part for consistency with the rest of Spark. For example, coercing an integer to floating-point should work, but coercing a floating-point to boolean should return an error to the user. |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @MaxGekk too since he is a release manager for Apache Spark 3.3 who need to cut branch-3.3.
|
This is ready for another review round @gengliangwang @viirya @wangyum @HyukjinKwon @dongjoon-hyun :) |
|
(Note the prior merge conflict in the lexer has now been resolved.) |
HyukjinKwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good w/ this change.
|
This is a parser-only change and the feature is not implemented yet, so definitely not a breaking change. But I'd like to confirm that, is every new SQL feature a breaking change? e.g. adding a new SQL function means that a query failed with "function not found" before, now succeeds. This doesn't seem like a breaking change to me. The same applies to accepting more parameters in a SQL function, accepting more parameter types, etc. The code change itself LGTM. |
|
@dtenedor Thanks for the first contribution! |
|
@gengliangwang and @cloud-fan . Why do we have Spark 3.4 patch in
Are we going to revert this from |
|
As I mentioned #35690 (review), I assumed we were going to wait until @MaxGekk cut a branch-3.3. |
|
@dongjoon-hyun I am merging this one to unblock @dtenedor's work on the actual changes in catalogs.
|
If there is a risk that it can hurt stability. Let's revert it. I will open a blocker for 3.3 than to not forget this. |
|
Thank you for your confirmations, @gengliangwang and @MaxGekk ! |
|
Hey, @MaxGekk . Did you make a block issue? Since today is the feature freeze day, I recover it to 3.4. |
|
👍 |
Here is the blocker SPARK-38566 and the PR which reverts the commit #35875 |
…support ### What changes were proposed in this pull request? Revert the commit e21cb62 from `branch-3.3`. ### Why are the changes needed? See discussion in the PR #35690. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By existing test suites. Closes #35885 from MaxGekk/revert-default-column-support-3.3. Authored-by: Max Gekk <[email protected]> Signed-off-by: Max Gekk <[email protected]>

What changes were proposed in this pull request?
Implement parser changes needed to support for DEFAULT column values as tracked in https://issues.apache.org/jira/browse/SPARK-38334.
Note that these are the parser changes only. Analysis support will take place in a following PR.
Background: in the future, CREATE TABLE and ALTER TABLE invocations will support setting column default values for later operations. Following INSERT, UPDATE, MERGE statements may then reference the value using the DEFAULT keyword as needed.
Examples:
Why are the changes needed?
This new API helps users run DDL and DML statements to add new values to tables and scan existing values from tables more easily.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Added unit test coverage in DDLParserSuite.scala