Skip to content

Conversation

@DazhuangSu
Copy link

What changes were proposed in this pull request?

This pr is inspired by @dongjoon-hyun.

quote from #15704 :

What changes were proposed in this pull request?
This PR aims to support comparators, e.g. '<', '<=', '>', '>=', again in Apache Spark 2.0 for backward compatibility.
Spark 1.6
scala> sql("CREATE TABLE sales(id INT) PARTITIONED BY (country STRING, quarter STRING)") res0: org.apache.spark.sql.DataFrame = [result: string]
scala> sql("ALTER TABLE sales DROP PARTITION (country < 'KR')") res1: org.apache.spark.sql.DataFrame = [result: string]
Spark 2.0
scala> sql("CREATE TABLE sales(id INT) PARTITIONED BY (country STRING, quarter STRING)") res0: org.apache.spark.sql.DataFrame = []
scala> sql("ALTER TABLE sales DROP PARTITION (country < 'KR')")
org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input '<' expecting {')', ','}(line 1, pos 42)
After this PR, it's supported.
How was this patch tested?
Pass the Jenkins test with a newly added testcase.

#16036 points out that if we use int literal in DROP PARTITION will fail after patching #15704.
The reason of this failing in #15704 is that AlterTableDropPartitionCommand tells BinayComparison and EqualTo with following code:

private def isRangeComparison(expr: Expression): Boolean = {

expr.find(e => e.isInstanceOf[BinaryComparison] && !e.isInstanceOf[EqualTo]).isDefined
}

This PR resolve this problem by telling a drop condition when parsing sqls.

How was this patch tested?

New testcase introduced from #15704

@gatorsmile
Copy link
Member

cc @dongjoon-hyun

@dongjoon-hyun
Copy link
Member

Thank you for pinging me, @gatorsmile .

@gatorsmile
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented Nov 14, 2017

Test build #83828 has finished for PR 19691 at commit 85fdb46.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 14, 2017

Test build #83831 has finished for PR 19691 at commit f18caeb.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 14, 2017

Test build #83832 has finished for PR 19691 at commit f79c6f4.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@DazhuangSu
Copy link
Author

Jenkins, retest this please

@SparkQA
Copy link

SparkQA commented Nov 14, 2017

Test build #83838 has finished for PR 19691 at commit 8728d3b.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 14, 2017

Test build #83839 has finished for PR 19691 at commit 9832ec5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@DazhuangSu
Copy link
Author

@gatorsmile @dongjoon-hyun
Could you give me some advice please?

@gatorsmile
Copy link
Member

ok to test

expression(pVal) match {
case EqualNullSafe(_, _) =>
throw new ParseException("'<=>' operator is not allowed in partition specification.", ctx)
case cmp @ BinaryComparison(UnresolvedAttribute(name :: Nil), constant: Literal) =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still the same question here. Constant has to be in the right side?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hive supports them only on the right side. So it makes sense to have the same here I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we support the right-side only, it seems be useful to print explicit error messages like left-side literal not supported ....?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatorsmile
Copy link
Member

@dongjoon-hyun @maropu @mgaido91 Could you review this PR? I think this command is a pretty useful to end users.

@SparkQA
Copy link

SparkQA commented Apr 8, 2018

Test build #89023 has finished for PR 19691 at commit 9832ec5.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Apr 8, 2018

retest this please

@maropu
Copy link
Member

maropu commented Apr 8, 2018

ok

throw new ParseException("Invalid partition filter specification", ctx)
}
}
if(parts.isEmpty) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't be better to return the Seq[Expression] as it is? Later we need it like that (in listPartitionsByFilter ) and in this way we can avoid using null which is a good thing too...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why aren't we returning parts? this if seems pretty useless

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right. I will change this.

}
}.distinct

if (normalizedSpecs.isEmpty && partitionSet.isEmpty) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can,t we just return partitionSet ++ normalizedSpecs ? I think it is wrong to use intersect, we should drop all of them, shouldn't we?

Copy link
Author

@DazhuangSu DazhuangSu Apr 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mgaido91 I tried this command in hive. And hive only dropped the intersection of two partition filter.

case EqualNullSafe(_, _) =>
throw new ParseException("'<=>' operator is not allowed in partition specification.", ctx)
case cmp @ BinaryComparison(UnresolvedAttribute(name :: Nil), constant: Literal) =>
cmp.withNewChildren(Seq(AttributeReference(name, StringType)(), constant))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ok to pass all the type of literals here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way, we might need tests for non int-literal cases.

case EqualNullSafe(_, _) =>
throw new ParseException("'<=>' operator is not allowed in partition specification.", ctx)
case cmp @ BinaryComparison(UnresolvedAttribute(name :: Nil), constant: Literal) =>
cmp.withNewChildren(Seq(AttributeReference(name, StringType)(), constant))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the partition column is not of String type?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I'll work on this these days.

@SparkQA
Copy link

SparkQA commented Apr 8, 2018

Test build #89029 has finished for PR 19691 at commit 9832ec5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mgaido91
Copy link
Contributor

@DazhuangSu are you still working on this?

@DazhuangSu
Copy link
Author

@mgaido91 Sorry, a little busy recently.
pr is almost ready. Will update soon.

@mgaido91
Copy link
Contributor

thanks @DazhuangSu

@SparkQA
Copy link

SparkQA commented May 30, 2018

Test build #91308 has finished for PR 19691 at commit 182449b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 31, 2018

Test build #91352 has finished for PR 19691 at commit d725fc9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 1, 2018

Test build #91393 has finished for PR 19691 at commit defc9f1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 5, 2018

Test build #91473 has finished for PR 19691 at commit 6b18939.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

}
val dataType = table.partitionSchema.apply(attrName).dataType
expr.withNewChildren(Seq(AttributeReference(attrName, dataType)(),
Cast(constant, dataType)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we add the cast only when needed, ie. dataType != constant.dataType?

extractFromPartitionSpec(partition._1, table, resolver)
} else if (!partition._1.isEmpty && !partition._2.isEmpty) {
// This drop condition has both partitionSpecs and expressions.
extractFromPartitionFilter(partition._2, catalog, table, resolver).intersect(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this may be quite inefficient if we have a lot if partitions. What about converting the partitionSpec is EqualsTo expressions and add them as conditions? It would be great IMO if we can achieve this by enforcing in the syntax that we have either all partitionSpecs or all expressions. So if we have all partition = value, we have a partitionSpec, while if at least one is a comparison different from =, we have all expressions (including the =s). What do you think?

Copy link
Author

@DazhuangSu DazhuangSu Jun 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree. And the hard part may be how to convert a partitionSpec to an EqualsTo.
I think it's better to let the AstBuilder to handle this. If so, we may have to have two AlterTableDropPartitionCommand instances in ddl.scala, one for all partitionSpec and one for all expression.
But it maybe a bit weird.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why? Isn't it enough something like:

((partitionVal (',' partitionVal)*) | (expression (',' expression)*))

?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean how to define AlterTableDropPartitionCommand better in ddl.scala. need to handle both
AlterTableDropPartitionCommand( tableName: TableIdentifier, partitions: Seq[Seq[Expression]], ifExists: Boolean, purge: Boolean, retainData: Boolean)
and
AlterTableDropPartitionCommand( tableName: TableIdentifier, partitions: Seq[TablePartitionSpec], ifExists: Boolean, purge: Boolean, retainData: Boolean)
Maybe telling the different cases inside the method?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can (must) just have a single: AlterTableDropPartitionCommand( tableName: TableIdentifier, partitionSpecs: Seq[TablePartitionSpec], partitionExprs: Seq[Seq[Expression]], ifExists: Boolean, purge: Boolean, retainData: Boolean). Indeed, we might have something like:

alter table foo drop partition (year=2017, month=12), partition(year=2018, month < 3);

where we have both a partition spec and an expression specification.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi, @mgaido91 there is one problem after I changed the syntax,
when i run sql DROP PARTITION (p >=2) it throws
org.apache.spark.sql.AnalysisException: cannot resolve 'p' given input columns: []
I'm trying to find a way to figure it out.

By the way, is a syntax like ((partitionVal (',' partitionVal)*) | (expression (',' expression)*)) legal? Because I wrote a antlr4 syntax test, but it didn't work as I supposed.

Besides, I was wrong that day. I think the if conditions won't be inefficient if there is a lot of partitions. it maybe inefficient if there are a lot of dropPartitionSpec which I don't think can happen easily.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DazhuangSu sorry I missed your last comment somehow.

Why do you say it would not be inefficient if you have a lot of partitions?I think it would be! Imagine that you partition per year and day. And you want to get the first 6 months of this year. The spec would be something like (year = 2018, day < 2018-07-01). Imagine we have a 10 years history. With the current implementation, we would get back basically all the the partitions from the filter, ie. roughly 3.650 and then it will intersect those. Anyway, my understanding is that such a case would not even work properly, as it would try drop the intersect of:

Seq(Seq("year"-> "2018", "day" -> "2018-01-01", ...)).intersect(Seq(Map("year"->"2018")))

which would result in an empty Seq, so we would drop nothing. Moreover, I saw no test for this case in the tests. Can we add tests for this use case and can we add support for it if my understanding that it is not working is right? Thanks

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mgaido91 I understand your point, yes it would be inefficient. I will work on this soon

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you @DazhuangSu

@HyukjinKwon
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented Jul 16, 2018

Test build #93052 has finished for PR 19691 at commit 6b18939.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MKervo
Copy link

MKervo commented Aug 20, 2018

Could someone merge this please ? :)

@maropu
Copy link
Member

maropu commented Aug 21, 2018

@DazhuangSu Can you resolve the conflict?

@DazhuangSu
Copy link
Author

@maropu ok

@maropu
Copy link
Member

maropu commented Aug 29, 2018

@HyukjinKwon can you trigger again?

@mgaido91
Copy link
Contributor

@DazhuangSu are you still working on this? There is this comment and also another nit which need to be addressed from the last review... Meanwhile I am not sure if someone else has other comments on this.

@HyukjinKwon
Copy link
Member

ok to test

@HyukjinKwon
Copy link
Member

Could anyone take over this then?

@maropu
Copy link
Member

maropu commented Aug 30, 2018

@DazhuangSu Are u there?

@SparkQA
Copy link

SparkQA commented Aug 30, 2018

Test build #95451 has finished for PR 19691 at commit 6b18939.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mgaido91
Copy link
Contributor

if @DazhuangSu is not active anymore on this I can take it over, but let's wait for his answer.

@DazhuangSu
Copy link
Author

DazhuangSu commented Aug 30, 2018

@mgaido91
Sorry guys. little busy recently.
I will resolve the failed tests this weekend first.

@maropu
Copy link
Member

maropu commented Sep 4, 2018

@DazhuangSu still busy?

@DazhuangSu
Copy link
Author

@maropu
Sorry. I don't really have much time this month.
I can close this pr and somebody can continue on this problem.

@maropu
Copy link
Member

maropu commented Sep 5, 2018

ok @mgaido91 can u take this over?

@mgaido91
Copy link
Contributor

mgaido91 commented Sep 5, 2018

@DazhuangSu @maropu sure, thanks, I'll submit a PR for this soon. Thanks.

@asfgit asfgit closed this in a3ba3a8 Nov 11, 2018
zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 22, 2025
Closes apache#21766
Closes apache#21679
Closes apache#21161
Closes apache#20846
Closes apache#19434
Closes apache#18080
Closes apache#17648
Closes apache#17169

Add:
Closes apache#22813
Closes apache#21994
Closes apache#22005
Closes apache#22463

Add:
Closes apache#15899

Add:
Closes apache#22539
Closes apache#21868
Closes apache#21514
Closes apache#21402
Closes apache#21322
Closes apache#21257
Closes apache#20163
Closes apache#19691
Closes apache#18697
Closes apache#18636
Closes apache#17176

Closes apache#23001 from wangyum/CloseStalePRs.

Authored-by: Yuming Wang <[email protected]>
Signed-off-by: hyukjinkwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants