[SPARK-14922][SPARK-17732][SQL]ALTER TABLE DROP PARTITION should support comparators #19691

DazhuangSu · 2017-11-08T07:36:55Z

What changes were proposed in this pull request?

This pr is inspired by @dongjoon-hyun.

quote from #15704 :

What changes were proposed in this pull request?
This PR aims to support comparators, e.g. '<', '<=', '>', '>=', again in Apache Spark 2.0 for backward compatibility.
Spark 1.6
scala> sql("CREATE TABLE sales(id INT) PARTITIONED BY (country STRING, quarter STRING)") res0: org.apache.spark.sql.DataFrame = [result: string]
scala> sql("ALTER TABLE sales DROP PARTITION (country < 'KR')") res1: org.apache.spark.sql.DataFrame = [result: string]
Spark 2.0
scala> sql("CREATE TABLE sales(id INT) PARTITIONED BY (country STRING, quarter STRING)") res0: org.apache.spark.sql.DataFrame = []
scala> sql("ALTER TABLE sales DROP PARTITION (country < 'KR')")
org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input '<' expecting {')', ','}(line 1, pos 42)
After this PR, it's supported.
How was this patch tested?
Pass the Jenkins test with a newly added testcase.

#16036 points out that if we use int literal in DROP PARTITION will fail after patching #15704.
The reason of this failing in #15704 is that AlterTableDropPartitionCommand tells BinayComparison and EqualTo with following code:

private def isRangeComparison(expr: Expression): Boolean = { 
expr.find(e => e.isInstanceOf[BinaryComparison] && !e.isInstanceOf[EqualTo]).isDefined }

This PR resolve this problem by telling a drop condition when parsing sqls.

How was this patch tested?

New testcase introduced from #15704

…ort comparators

gatorsmile · 2017-11-13T22:40:45Z

cc @dongjoon-hyun

dongjoon-hyun · 2017-11-13T22:54:23Z

Thank you for pinging me, @gatorsmile .

gatorsmile · 2017-11-14T06:52:51Z

ok to test

SparkQA · 2017-11-14T06:59:04Z

Test build #83828 has finished for PR 19691 at commit 85fdb46.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-11-14T07:24:33Z

Test build #83831 has finished for PR 19691 at commit f18caeb.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-11-14T08:05:01Z

Test build #83832 has finished for PR 19691 at commit f79c6f4.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

DazhuangSu · 2017-11-14T08:27:40Z

Jenkins, retest this please

SparkQA · 2017-11-14T08:54:07Z

Test build #83838 has finished for PR 19691 at commit 8728d3b.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-11-14T12:04:30Z

Test build #83839 has finished for PR 19691 at commit 9832ec5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

DazhuangSu · 2017-11-17T03:12:55Z

@gatorsmile @dongjoon-hyun
Could you give me some advice please?

gatorsmile · 2018-04-08T03:57:11Z

ok to test

gatorsmile · 2018-04-08T03:58:12Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

+      expression(pVal) match {
+        case EqualNullSafe(_, _) =>
+          throw new ParseException("'<=>' operator is not allowed in partition specification.", ctx)
+        case cmp @ BinaryComparison(UnresolvedAttribute(name :: Nil), constant: Literal) =>


Still the same question here. Constant has to be in the right side?

Hive supports them only on the right side. So it makes sense to have the same here I think.

If we support the right-side only, it seems be useful to print explicit error messages like left-side literal not supported ....?

we can also enforce this is the syntax, like here: https://github.com/apache/spark/pull/20999/files#diff-8c1cb2af4aa1109e08481dae79052cc3R269

gatorsmile · 2018-04-08T04:02:29Z

@dongjoon-hyun @maropu @mgaido91 Could you review this PR? I think this command is a pretty useful to end users.

SparkQA · 2018-04-08T07:05:02Z

Test build #89023 has finished for PR 19691 at commit 9832ec5.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2018-04-08T08:36:41Z

retest this please

maropu · 2018-04-08T08:36:45Z

ok

mgaido91 · 2018-04-08T08:54:46Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

+          throw new ParseException("Invalid partition filter specification", ctx)
+      }
+    }
+    if(parts.isEmpty) {


wouldn't be better to return the Seq[Expression] as it is? Later we need it like that (in listPartitionsByFilter ) and in this way we can avoid using null which is a good thing too...

why aren't we returning parts? this if seems pretty useless

you're right. I will change this.

mgaido91 · 2018-04-08T08:59:17Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala

+        }
+      }.distinct
+
+      if (normalizedSpecs.isEmpty && partitionSet.isEmpty) {


can,t we just return partitionSet ++ normalizedSpecs ? I think it is wrong to use intersect, we should drop all of them, shouldn't we?

@mgaido91 I tried this command in hive. And hive only dropped the intersection of two partition filter.

maropu · 2018-04-08T09:22:04Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

+        case EqualNullSafe(_, _) =>
+          throw new ParseException("'<=>' operator is not allowed in partition specification.", ctx)
+        case cmp @ BinaryComparison(UnresolvedAttribute(name :: Nil), constant: Literal) =>
+          cmp.withNewChildren(Seq(AttributeReference(name, StringType)(), constant))


Is it ok to pass all the type of literals here?

Either way, we might need tests for non int-literal cases.

mgaido91 · 2018-04-08T09:24:21Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

+        case EqualNullSafe(_, _) =>
+          throw new ParseException("'<=>' operator is not allowed in partition specification.", ctx)
+        case cmp @ BinaryComparison(UnresolvedAttribute(name :: Nil), constant: Literal) =>
+          cmp.withNewChildren(Seq(AttributeReference(name, StringType)(), constant))


What if the partition column is not of String type?

OK. I'll work on this these days.

SparkQA · 2018-04-08T11:30:16Z

Test build #89029 has finished for PR 19691 at commit 9832ec5.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

mgaido91 · 2018-05-15T12:48:31Z

@DazhuangSu are you still working on this?

DazhuangSu · 2018-05-20T14:04:18Z

@mgaido91 Sorry, a little busy recently.
pr is almost ready. Will update soon.

mgaido91 · 2018-05-20T15:50:09Z

thanks @DazhuangSu

SparkQA · 2018-05-30T20:51:27Z

Test build #91308 has finished for PR 19691 at commit 182449b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-05-31T20:35:34Z

Test build #91352 has finished for PR 19691 at commit d725fc9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-06-01T18:37:14Z

Test build #91393 has finished for PR 19691 at commit defc9f1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-06-05T07:05:02Z

Test build #91473 has finished for PR 19691 at commit 6b18939.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

mgaido91 · 2018-06-05T09:10:59Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala

+      }
+      val dataType = table.partitionSchema.apply(attrName).dataType
+      expr.withNewChildren(Seq(AttributeReference(attrName, dataType)(),
+        Cast(constant, dataType)))


nit: can we add the cast only when needed, ie. dataType != constant.dataType?

mgaido91 · 2018-06-05T09:17:44Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala

+        extractFromPartitionSpec(partition._1, table, resolver)
+      } else if (!partition._1.isEmpty && !partition._2.isEmpty) {
+        // This drop condition has both partitionSpecs and expressions.
+        extractFromPartitionFilter(partition._2, catalog, table, resolver).intersect(


I think this may be quite inefficient if we have a lot if partitions. What about converting the partitionSpec is EqualsTo expressions and add them as conditions? It would be great IMO if we can achieve this by enforcing in the syntax that we have either all partitionSpecs or all expressions. So if we have all partition = value, we have a partitionSpec, while if at least one is a comparison different from =, we have all expressions (including the =s). What do you think?

Yeah, I agree. And the hard part may be how to convert a partitionSpec to an EqualsTo.
I think it's better to let the AstBuilder to handle this. If so, we may have to have two AlterTableDropPartitionCommand instances in ddl.scala, one for all partitionSpec and one for all expression.
But it maybe a bit weird.

why? Isn't it enough something like:

((partitionVal (',' partitionVal)*) | (expression (',' expression)*))

?

I mean how to define AlterTableDropPartitionCommand better in ddl.scala. need to handle both
AlterTableDropPartitionCommand( tableName: TableIdentifier, partitions: Seq[Seq[Expression]], ifExists: Boolean, purge: Boolean, retainData: Boolean)
and
AlterTableDropPartitionCommand( tableName: TableIdentifier, partitions: Seq[TablePartitionSpec], ifExists: Boolean, purge: Boolean, retainData: Boolean)
Maybe telling the different cases inside the method?

I think we can (must) just have a single: AlterTableDropPartitionCommand( tableName: TableIdentifier, partitionSpecs: Seq[TablePartitionSpec], partitionExprs: Seq[Seq[Expression]], ifExists: Boolean, purge: Boolean, retainData: Boolean). Indeed, we might have something like:

alter table foo drop partition (year=2017, month=12), partition(year=2018, month < 3);

where we have both a partition spec and an expression specification.

hi, @mgaido91 there is one problem after I changed the syntax,
when i run sql DROP PARTITION (p >=2) it throws
org.apache.spark.sql.AnalysisException: cannot resolve 'p' given input columns: []
I'm trying to find a way to figure it out.

By the way, is a syntax like ((partitionVal (',' partitionVal)*) | (expression (',' expression)*)) legal? Because I wrote a antlr4 syntax test, but it didn't work as I supposed.

Besides, I was wrong that day. I think the if conditions won't be inefficient if there is a lot of partitions. it maybe inefficient if there are a lot of dropPartitionSpec which I don't think can happen easily.

@DazhuangSu sorry I missed your last comment somehow.

Why do you say it would not be inefficient if you have a lot of partitions?I think it would be! Imagine that you partition per year and day. And you want to get the first 6 months of this year. The spec would be something like (year = 2018, day < 2018-07-01). Imagine we have a 10 years history. With the current implementation, we would get back basically all the the partitions from the filter, ie. roughly 3.650 and then it will intersect those. Anyway, my understanding is that such a case would not even work properly, as it would try drop the intersect of:

Seq(Seq("year"-> "2018", "day" -> "2018-01-01", ...)).intersect(Seq(Map("year"->"2018")))

which would result in an empty Seq, so we would drop nothing. Moreover, I saw no test for this case in the tests. Can we add tests for this use case and can we add support for it if my understanding that it is not working is right? Thanks

@mgaido91 I understand your point, yes it would be inefficient. I will work on this soon

thank you @DazhuangSu

HyukjinKwon · 2018-07-16T02:20:05Z

ok to test

SparkQA · 2018-07-16T07:03:26Z

Test build #93052 has finished for PR 19691 at commit 6b18939.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

MKervo · 2018-08-20T14:46:31Z

Could someone merge this please ? :)

maropu · 2018-08-21T11:39:57Z

@DazhuangSu Can you resolve the conflict?

DazhuangSu · 2018-08-23T11:52:07Z

@maropu ok

maropu · 2018-08-29T04:39:36Z

@HyukjinKwon can you trigger again?

mgaido91 · 2018-08-29T07:28:44Z

@DazhuangSu are you still working on this? There is this comment and also another nit which need to be addressed from the last review... Meanwhile I am not sure if someone else has other comments on this.

HyukjinKwon · 2018-08-30T01:39:54Z

ok to test

HyukjinKwon · 2018-08-30T01:40:15Z

Could anyone take over this then?

maropu · 2018-08-30T01:57:44Z

@DazhuangSu Are u there?

SparkQA · 2018-08-30T03:35:59Z

Test build #95451 has finished for PR 19691 at commit 6b18939.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

mgaido91 · 2018-08-30T07:15:53Z

if @DazhuangSu is not active anymore on this I can take it over, but let's wait for his answer.

DazhuangSu · 2018-08-30T13:12:12Z

@mgaido91
Sorry guys. little busy recently.
I will resolve the failed tests this weekend first.

maropu · 2018-09-04T12:55:18Z

@DazhuangSu still busy?

DazhuangSu · 2018-09-05T07:03:30Z

@maropu
Sorry. I don't really have much time this month.
I can close this pr and somebody can continue on this problem.

maropu · 2018-09-05T08:31:02Z

ok @mgaido91 can u take this over?

mgaido91 · 2018-09-05T09:00:08Z

@DazhuangSu @maropu sure, thanks, I'll submit a PR for this soon. Thanks.

Closes apache#21766 Closes apache#21679 Closes apache#21161 Closes apache#20846 Closes apache#19434 Closes apache#18080 Closes apache#17648 Closes apache#17169 Add: Closes apache#22813 Closes apache#21994 Closes apache#22005 Closes apache#22463 Add: Closes apache#15899 Add: Closes apache#22539 Closes apache#21868 Closes apache#21514 Closes apache#21402 Closes apache#21322 Closes apache#21257 Closes apache#20163 Closes apache#19691 Closes apache#18697 Closes apache#18636 Closes apache#17176 Closes apache#23001 from wangyum/CloseStalePRs. Authored-by: Yuming Wang <[email protected]> Signed-off-by: hyukjinkwon <[email protected]>

[SPARK-14922][SPARK-17732][SQL]ALTER TABLE DROP PARTITION should supp…

20f658a

…ort comparators

bug fix

85fdb46

Scala Style fix

f18caeb

Scala Style fix

f79c6f4

some changes

8728d3b

Scala Style fix

9832ec5

gatorsmile mentioned this pull request Apr 6, 2018

[SPARK-14922][SPARK-17732][SPARK-23866][SQL] Support partition filters in ALTER TABLE DROP PARTITION #20999

Closed

gatorsmile reviewed Apr 8, 2018

View reviewed changes

mgaido91 reviewed Apr 8, 2018

View reviewed changes

maropu reviewed Apr 8, 2018

View reviewed changes

mgaido91 reviewed Apr 8, 2018

View reviewed changes

address comments

182449b

update

d725fc9

update

defc9f1

address comments

6b18939

mgaido91 reviewed Jun 5, 2018

View reviewed changes

HyukjinKwon mentioned this pull request Nov 11, 2018

[INFRA] Close stale PRs #23001

Closed

asfgit closed this in a3ba3a8 Nov 11, 2018

weixiuli mentioned this pull request Oct 28, 2019

[SPARK-14922][SPARK-17732][SPARK-23866][SQL] Support partition filter in ALTER TABLE DROP PARTITION and batch dropping PARTITIONS #26280

Closed

[SPARK-14922][SPARK-17732][SQL]ALTER TABLE DROP PARTITION should support comparators #19691

[SPARK-14922][SPARK-17732][SQL]ALTER TABLE DROP PARTITION should support comparators #19691

Uh oh!

Conversation

DazhuangSu commented Nov 8, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

gatorsmile commented Nov 13, 2017

Uh oh!

dongjoon-hyun commented Nov 13, 2017

Uh oh!

gatorsmile commented Nov 14, 2017

Uh oh!

SparkQA commented Nov 14, 2017

Uh oh!

SparkQA commented Nov 14, 2017

Uh oh!

SparkQA commented Nov 14, 2017

Uh oh!

DazhuangSu commented Nov 14, 2017

Uh oh!

SparkQA commented Nov 14, 2017

Uh oh!

SparkQA commented Nov 14, 2017

Uh oh!

DazhuangSu commented Nov 17, 2017

Uh oh!

gatorsmile commented Apr 8, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gatorsmile commented Apr 8, 2018

Uh oh!

SparkQA commented Apr 8, 2018

Uh oh!

maropu commented Apr 8, 2018

Uh oh!

maropu commented Apr 8, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DazhuangSu Apr 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Apr 8, 2018

Uh oh!

mgaido91 commented May 15, 2018

Uh oh!

DazhuangSu commented May 20, 2018

Uh oh!

mgaido91 commented May 20, 2018

Uh oh!

SparkQA commented May 30, 2018

Uh oh!

SparkQA commented May 31, 2018

Uh oh!

SparkQA commented Jun 1, 2018

Uh oh!

DazhuangSu Apr 16, 2018 •

edited

Loading

DazhuangSu Jun 5, 2018 •

edited

Loading