Skip to content

Comments

[SPARK-30814][SQL] ALTER TABLE ... ADD COLUMN position should be able to reference columns being added #27584

Closed
imback82 wants to merge 5 commits intoapache:masterfrom
imback82:alter_table_pos_fix
Closed

[SPARK-30814][SQL] ALTER TABLE ... ADD COLUMN position should be able to reference columns being added #27584
imback82 wants to merge 5 commits intoapache:masterfrom
imback82:alter_table_pos_fix

Conversation

@imback82
Copy link
Contributor

What changes were proposed in this pull request?

In ALTER TABLE, a column in ADD COLUMNS can depend on the position of a column that is just being added. For example, for a table with the following schema:

root:
  - a: string
  - b: long

, the following should work:

ALTER TABLE t ADD COLUMNS (x int AFTER a, y int AFTER x) 

Currently, the above statement will throw an exception saying that AFTER x cannot be resolved, because x doesn't exist yet. This PR proposes to fix this issue.

Why are the changes needed?

To fix a bug described above.

Does this PR introduce any user-facing change?

Yes, now

ALTER TABLE t ADD COLUMNS (x int AFTER a, y int AFTER x) 

works as expected.

How was this patch tested?

Added new tests

@imback82
Copy link
Contributor Author

cc @brkyvz @cloud-fan

@SparkQA
Copy link

SparkQA commented Feb 14, 2020

Test build #118443 has finished for PR 27584 at commit fd383b3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 15, 2020

Test build #118451 has finished for PR 27584 at commit cadda78.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp {
case a @ AlterTable(_, _, t: NamedRelation, changes) if t.resolved =>
// Keeps track of new columns being added. Keys are normalized parent names and
// values are field names that belong to their parent.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we be more clear here? For example, if we add a column a.b.c.d, which part should be the key? a.b.c or a.b?

case after: After =>
// Handle the case where column position is referencing new columns being added.
if (!colsToAdd.contains(add.fieldNames().init :+ after.column)) {
positionArgumentExists(add.position(), parent)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we just add one more parameter to positionArgumentExists for the newly added columns?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just want to make sure all the caller side of positionArgumentExists take care of newly added columns.

Copy link
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except one comment

@SparkQA
Copy link

SparkQA commented Feb 17, 2020

Test build #118544 has finished for PR 27584 at commit 34b1363.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Feb 17, 2020

Test build #118556 has finished for PR 27584 at commit 34b1363.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@brkyvz
Copy link
Contributor

brkyvz commented Feb 17, 2020

LGTM as well! Thanks @imback82 for the quick fix :)

@SparkQA
Copy link

SparkQA commented Feb 17, 2020

Test build #118599 has finished for PR 27584 at commit 42d1c48.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.0!

@cloud-fan cloud-fan closed this in 5866bc7 Feb 18, 2020
cloud-fan pushed a commit that referenced this pull request Feb 19, 2020
… able to reference columns being added

(Backport of #27584 + partial #27482)

### What changes were proposed in this pull request?

In ALTER TABLE, a column in ADD COLUMNS can depend on the position of a column that is just being added. For example, for a table with the following schema:
```
root:
  - a: string
  - b: long
```
, the following should work:
```
ALTER TABLE t ADD COLUMNS (x int AFTER a, y int AFTER x)
```
Currently, the above statement will throw an exception saying that AFTER x cannot be resolved, because x doesn't exist yet. This PR proposes to fix this issue.

### Why are the changes needed?

To fix a bug described above.

### Does this PR introduce any user-facing change?

Yes, now
```
ALTER TABLE t ADD COLUMNS (x int AFTER a, y int AFTER x)
```
works as expected.

### How was this patch tested?

Added new tests

Closes #27624 from imback82/backport_27584.

Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
… to reference columns being added

### What changes were proposed in this pull request?

In ALTER TABLE, a column in ADD COLUMNS can depend on the position of a column that is just being added. For example, for a table with the following schema:
```
root:
  - a: string
  - b: long
```
, the following should work:
```
ALTER TABLE t ADD COLUMNS (x int AFTER a, y int AFTER x)
```
Currently, the above statement will throw an exception saying that AFTER x cannot be resolved, because x doesn't exist yet. This PR proposes to fix this issue.

### Why are the changes needed?

To fix a bug described above.

### Does this PR introduce any user-facing change?

Yes, now
```
ALTER TABLE t ADD COLUMNS (x int AFTER a, y int AFTER x)
```
works as expected.

### How was this patch tested?

Added new tests

Closes apache#27584 from imback82/alter_table_pos_fix.

Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants