[SPARK-38335][SQL] Implement parser support for DEFAULT column values #35690

dtenedor · 2022-03-01T01:45:46Z

What changes were proposed in this pull request?

Implement parser changes needed to support for DEFAULT column values as tracked in https://issues.apache.org/jira/browse/SPARK-38334.

Note that these are the parser changes only. Analysis support will take place in a following PR.

Background: in the future, CREATE TABLE and ALTER TABLE invocations will support setting column default values for later operations. Following INSERT, UPDATE, MERGE statements may then reference the value using the DEFAULT keyword as needed.

Examples:

CREATE TABLE T(a INT, b INT NOT NULL);

-- The default default is NULL
INSERT INTO T VALUES (DEFAULT, 0);
INSERT INTO T(b)  VALUES (1);
SELECT * FROM T;
(NULL, 0)
(NULL, 1)

-- Adding a default to a table with rows, sets the values for the
-- existing rows (exist default) and new rows (current default).
ALTER TABLE T ADD COLUMN c INT DEFAULT 5;
INSERT INTO T VALUES (1, 2, DEFAULT);
SELECT * FROM T;
(NULL, 0, 5)
(NULL, 1, 5)
(1, 2, 5)

Why are the changes needed?

This new API helps users run DDL and DML statements to add new values to tables and scan existing values from tables more easily.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added unit test coverage in DDLParserSuite.scala

sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala

gengliangwang

LGTM except minor comments

AmplabJenkins · 2022-03-02T07:12:05Z

Can one of the admins verify this patch?

dtenedor · 2022-03-02T19:46:42Z

jenkins merge

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

gengliangwang · 2022-03-03T03:08:19Z

Supporting default column values is very common among DBMS. However, this will be a breaking change for Spark SQL
Currently Spark SQL

> create table t(i int, j int);
> insert into t values(1);
Error in query: `default`.`t` requires that the data to be inserted have the same number of columns as the target table: target table has 2 column(s) but the inserted data has 1 column(s), including 0 partition column(s) having constant value(s).

After supporting default column value:

> create table t(i int, j int);
> insert into t values(1);
> select * from t;
1	NULL

> create table t2(i int, j int default 0);
> insert into t2 values(1);
> select * from t2;
1	0

I am +1 with the change.
Before merging this PR, I would like to collect the opinions of more committers. We can send SPIP for voting if necessary.
cc @cloud-fan @dongjoon-hyun @viirya @dbtsai @huaxingao @maropu @zsxwing @wangyum @yaooqinn WDYT?

dongjoon-hyun

Thank you for pinging me, @gengliangwang .

dongjoon-hyun · 2022-03-03T04:30:01Z

cc @aokolnychyi , @RussellSpitzer , @rdblue , too.

sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala

viirya

Thanks for the work.

For the parser change itself, looks okay. As this is a breaking change, I'd like to see some clarification on why this is necessary to have. What issue we have if we don't have this (because we don't have the default value for long time) and do we have any workaround now?

dongjoon-hyun

This PR doesn't seem to have the full body yet, what is your release target for this, @dtenedor and @gengliangwang ? I'm curious about the general error handling.

Creating NULL default value for NOT NULL column
Type mismatch between default value literal and column type.
Upcasting or not in case of type mismatch

sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4

dtenedor · 2022-03-03T19:06:28Z

Creating NULL default value for NOT NULL column
Type mismatch between default value literal and column type.
Upcasting or not in case of type mismatch

IMO:

Not Null column can't have Null default

Type mismatch between default value literal and column type: we can simply forbid this. Note that we have many numeric types(Byte/Short/Int/Long/Decimal/Float/Double). If both default value literal type and column type are Numeric, it is not considered a mismatch.

Upcasting or not in case of type mismatch: casting can happen if both of the literal type and column type are Numeric

@dtenedor WDYT?

Good questions, I replied above earlier. We can perform a type coercion from the provided type to the required type, or return an error if the types are not coercible in this way. We can use existing type coercion rules in the analyzer for this part for consistency with the rest of Spark. For example, coercing an integer to floating-point should work, but coercing a floating-point to boolean should return an error to the user.

dongjoon-hyun

cc @MaxGekk too since he is a release manager for Apache Spark 3.3 who need to cut branch-3.3.

dtenedor · 2022-03-04T01:04:21Z

This is ready for another review round @gengliangwang @viirya @wangyum @HyukjinKwon @dongjoon-hyun :)

dtenedor · 2022-03-06T04:51:01Z

(Note the prior merge conflict in the lexer has now been resolved.)

HyukjinKwon

I'm good w/ this change.

cloud-fan · 2022-03-07T08:50:02Z

This is a parser-only change and the feature is not implemented yet, so definitely not a breaking change. But I'd like to confirm that, is every new SQL feature a breaking change? e.g. adding a new SQL function means that a query failed with "function not found" before, now succeeds. This doesn't seem like a breaking change to me. The same applies to accepting more parameters in a SQL function, accepting more parameter types, etc.

The code change itself LGTM.

gengliangwang · 2022-03-07T09:04:10Z

@dtenedor Thanks for the first contribution!
@HyukjinKwon @dongjoon-hyun @viirya @wangyum @cloud-fan thanks for the inputs! I am merging this parser-only PR to unblock @dtenedor's works in this feature.

dongjoon-hyun · 2022-03-08T07:22:37Z

@gengliangwang and @cloud-fan . Why do we have Spark 3.4 patch in master branch for Apache Spark 3.3?

@dongjoon-hyun This is for Spark 3.4

Are we going to revert this from branch-3.3? cc @MaxGekk

dongjoon-hyun · 2022-03-08T07:23:51Z

As I mentioned #35690 (review), I assumed we were going to wait until @MaxGekk cut a branch-3.3.

gengliangwang · 2022-03-08T07:29:40Z

@dongjoon-hyun I am merging this one to unblock @dtenedor's work on the actual changes in catalogs.
I will:

revert this one on branch-3.3
make sure no new related PRs merged to master until branch-3.3 is cut

MaxGekk · 2022-03-08T07:43:15Z

Are we going to revert this from branch-3.3? cc @MaxGekk

If there is a risk that it can hurt stability. Let's revert it. I will open a blocker for 3.3 than to not forget this.

dongjoon-hyun · 2022-03-08T07:46:45Z

Thank you for your confirmations, @gengliangwang and @MaxGekk !

dongjoon-hyun · 2022-03-15T18:44:39Z

Hey, @MaxGekk . Did you make a block issue?
To @gengliangwang and @HyukjinKwon , I saw the JIRA was resolved as 3.3 because we want to avoid our merge script shows 3.4 as a new version.

Since today is the feature freeze day, I recover it to 3.4.

HyukjinKwon · 2022-03-16T00:32:15Z

👍

MaxGekk · 2022-03-16T11:01:20Z

Hey, @MaxGekk . Did you make a block issue?

Here is the blocker SPARK-38566 and the PR which reverts the commit #35875

…support ### What changes were proposed in this pull request? Revert the commit e21cb62 from `branch-3.3`. ### Why are the changes needed? See discussion in the PR #35690. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By existing test suites. Closes #35885 from MaxGekk/revert-default-column-support-3.3. Authored-by: Max Gekk <[email protected]> Signed-off-by: Max Gekk <[email protected]>

initial dummy comit

ed74b65

github-actions bot added the SQL label Mar 1, 2022

Add parser grammar extensions and AstBuilder support with unit tests

fa47251

gengliangwang reviewed Mar 1, 2022

View reviewed changes

sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 Outdated Show resolved Hide resolved

gengliangwang reviewed Mar 1, 2022

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala Show resolved Hide resolved

Also support REPLACE TABLE

e81a11e

dtenedor marked this pull request as ready for review March 1, 2022 20:09

dtenedor added 3 commits March 1, 2022 13:03

Update SparkSqlParser.scala as well

c0d9153

Improved unit tests to inspect actual plans

ee0665c

Document the new DEFAULT keyword

2d1b0b6

github-actions bot added the DOCS label Mar 1, 2022

Update PR number in the unit test

127648e

dtenedor changed the title ~~[SPARK-38334][SQL] Implement parser support for DEFAULT column values~~ [SPARK-38335][SQL] Implement parser support for DEFAULT column values Mar 2, 2022

gengliangwang reviewed Mar 2, 2022

View reviewed changes

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala Outdated Show resolved Hide resolved

gengliangwang reviewed Mar 2, 2022

View reviewed changes

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala Outdated Show resolved Hide resolved

gengliangwang approved these changes Mar 2, 2022

View reviewed changes

Removed unused code

b029b5d

HyukjinKwon reviewed Mar 3, 2022

View reviewed changes

sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 Outdated Show resolved Hide resolved

HyukjinKwon reviewed Mar 3, 2022

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala Outdated Show resolved Hide resolved

dongjoon-hyun reviewed Mar 3, 2022

View reviewed changes

viirya reviewed Mar 3, 2022

View reviewed changes

sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 Show resolved Hide resolved

viirya reviewed Mar 3, 2022

View reviewed changes

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala Outdated Show resolved Hide resolved

viirya reviewed Mar 3, 2022

View reviewed changes

dongjoon-hyun reviewed Mar 3, 2022

View reviewed changes

wangyum reviewed Mar 3, 2022

View reviewed changes

sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 Show resolved Hide resolved

Respond to comments

9b33a21

dongjoon-hyun reviewed Mar 3, 2022

View reviewed changes

dtenedor requested review from HyukjinKwon, dongjoon-hyun, gengliangwang, viirya and wangyum March 3, 2022 22:50

dtenedor added 4 commits March 5, 2022 20:39

Merge

394f193

Merge

44acf07

Merge

68ad8c5

Merge

df9570c

HyukjinKwon approved these changes Mar 7, 2022

View reviewed changes

gengliangwang closed this in e21cb62 Mar 7, 2022

melin mentioned this pull request Mar 8, 2022

[HUDI-2560][RFC-33] Support full Schema evolution for Spark apache/hudi#4910

Merged

5 tasks

This was referenced Mar 16, 2022

[SPARK-38566][SQL][3.3] Revert the parser changes for DEFAULT column support #35875

Closed

[SPARK-38566][SQL][3.3] Revert the parser changes for DEFAULT column support #35885

Closed

[SPARK-38335][SQL] Implement parser support for DEFAULT column values #35690

[SPARK-38335][SQL] Implement parser support for DEFAULT column values #35690

Uh oh!

Conversation

dtenedor commented Mar 1, 2022

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gengliangwang left a comment

Choose a reason for hiding this comment

Uh oh!

AmplabJenkins commented Mar 2, 2022

Uh oh!

dtenedor commented Mar 2, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gengliangwang commented Mar 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Mar 3, 2022

Uh oh!

Uh oh!

Uh oh!

viirya left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dtenedor commented Mar 3, 2022

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

dtenedor commented Mar 4, 2022

Uh oh!

dtenedor commented Mar 6, 2022

Uh oh!

HyukjinKwon left a comment

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Mar 7, 2022

Uh oh!

gengliangwang commented Mar 7, 2022

Uh oh!

dongjoon-hyun commented Mar 8, 2022

Uh oh!

dongjoon-hyun commented Mar 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gengliangwang commented Mar 8, 2022

Uh oh!

MaxGekk commented Mar 8, 2022

Uh oh!

dongjoon-hyun commented Mar 8, 2022

Uh oh!

dongjoon-hyun commented Mar 15, 2022

Uh oh!

HyukjinKwon commented Mar 16, 2022

Uh oh!

MaxGekk commented Mar 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

gengliangwang commented Mar 3, 2022 •

edited

Loading

dongjoon-hyun left a comment •

edited

Loading

dongjoon-hyun commented Mar 8, 2022 •

edited

Loading

MaxGekk commented Mar 16, 2022 •

edited

Loading