[SPARK-30613][SQL] Support Hive style REPLACE COLUMNS syntax#27482
[SPARK-30613][SQL] Support Hive style REPLACE COLUMNS syntax#27482imback82 wants to merge 3 commits intoapache:masterfrom
Conversation
|
@cloud-fan This is WIP, but I have a couple of questions.
|
|
Test build #118009 has finished for PR 27482 at commit
|
This is a good point. I think we should, can you open a PR to improve the doc?
I think it's OK. We can clean it up later, thinking about how to resolve commands in general. |
|
@cloud-fan this is now ready for review. Thanks! |
|
We currently have the following for But it seems that only the following is the sql standard: , and the following is Hive style: Should we fix this as well? (if so, we can combine hive style ADD and REPLACE grammar easily as well.) |
|
Test build #118275 has finished for PR 27482 at commit
|
|
The problem is that we can't remove a SQL syntax that works in prior releases. Maybe we have to bear with it here. |
| withTable(t) { | ||
| sql(s"CREATE TABLE $t (col1 int, col2 int) USING $v2Format") | ||
| sql(s"ALTER TABLE $t REPLACE COLUMNS " + | ||
| "(col2 string COMMENT 'comment2', col3 int COMMENT 'comment3')") |
There was a problem hiding this comment.
One question: if the col2 already has comment but we don't specify new comment in REPLACE COLUMNS, shall we retain the comment? What's the behavior of Hive?
There was a problem hiding this comment.
The behavior of REPLACE COLUMNS is to drop all the existing columns first then add new columns. Thus, the comment will not be retained. I will update the test to reflect this.
|
retest this please |
|
Test build #118294 has finished for PR 27482 at commit
|
|
Test build #118313 has finished for PR 27482 at commit
|
|
LGTM, merging to master! |
| } | ||
| } | ||
|
|
||
| val colsToDelete = mutable.Set.empty[Seq[String]] |
There was a problem hiding this comment.
There was a problem hiding this comment.
Yes, working on it now!
… able to reference columns being added (Backport of #27584 + partial #27482) ### What changes were proposed in this pull request? In ALTER TABLE, a column in ADD COLUMNS can depend on the position of a column that is just being added. For example, for a table with the following schema: ``` root: - a: string - b: long ``` , the following should work: ``` ALTER TABLE t ADD COLUMNS (x int AFTER a, y int AFTER x) ``` Currently, the above statement will throw an exception saying that AFTER x cannot be resolved, because x doesn't exist yet. This PR proposes to fix this issue. ### Why are the changes needed? To fix a bug described above. ### Does this PR introduce any user-facing change? Yes, now ``` ALTER TABLE t ADD COLUMNS (x int AFTER a, y int AFTER x) ``` works as expected. ### How was this patch tested? Added new tests Closes #27624 from imback82/backport_27584. Authored-by: Terry Kim <yuminkim@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request? This PR proposes to support Hive-style `ALTER TABLE ... REPLACE COLUMNS ...` as described in https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Add/ReplaceColumns The user now can do the following: ```SQL CREATE TABLE t (col1 int, col2 int) USING Foo; ALTER TABLE t REPLACE COLUMNS (col2 string COMMENT 'comment2', col3 int COMMENT 'comment3'); ``` , which drops the existing columns `col1` and `col2`, and add new columns `col2` and `col3`. ### Why are the changes needed? This is a new DDL statement. Spark currently supports the Hive-style `ALTER TABLE ... CHANGE COLUMN ...`, so this new addition can be useful. ### Does this PR introduce any user-facing change? Yes, adding a new DDL statement. ### How was this patch tested? More tests to be added. Closes apache#27482 from imback82/replace_cols. Authored-by: Terry Kim <yuminkim@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
This PR proposes to support Hive-style
ALTER TABLE ... REPLACE COLUMNS ...as described in https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Add/ReplaceColumnsThe user now can do the following:
, which drops the existing columns
col1andcol2, and add new columnscol2andcol3.Why are the changes needed?
This is a new DDL statement. Spark currently supports the Hive-style
ALTER TABLE ... CHANGE COLUMN ..., so this new addition can be useful.Does this PR introduce any user-facing change?
Yes, adding a new DDL statement.
How was this patch tested?
More tests to be added.