Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

What changes were proposed in this pull request?

Current LikeSimplification handles the following four rules.

  • 'a%' => expr.StartsWith("a")
  • '%b' => expr.EndsWith("b")
  • '%a%' => expr.Contains("a")
  • 'a' => EqualTo("a")

This PR adds the following rule.

  • 'a%b' => expr.Length() >= 2 && expr.StartsWith("a") && expr.EndsWith("b")

Here, 2 is statically calculated from "a".size + "b".size.

Before

scala> sql("select a from (select explode(array('abc','adc')) a) T where a like 'a%c'").explain()
== Physical Plan ==
WholeStageCodegen
:  +- Filter a#5 LIKE a%c
:     +- INPUT
+- Generate explode([abc,adc]), false, false, [a#5]
   +- Scan OneRowRelation[]

After

scala> sql("select a from (select explode(array('abc','adc')) a) T where a like 'a%c'").explain()
== Physical Plan ==
WholeStageCodegen
:  +- Filter ((length(a#5) >= 2) && (StartsWith(a#5, a) && EndsWith(a#5, c)))
:     +- INPUT
+- Generate explode([abc,adc]), false, false, [a#5]
   +- Scan OneRowRelation[]

How was this patch tested?

Pass the Jenkins tests (including new testcase).

@SparkQA
Copy link

SparkQA commented Apr 12, 2016

Test build #55555 has finished for PR 12312 at commit 8d496f6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

StartsWith(l, Literal(pattern))
case endsWith(pattern) =>
EndsWith(l, Literal(pattern))
case startsAndEndsWith(prefix, postfix) if !prefix.endsWith("\\") =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think you should have some comments explaining this rewrite, in particular the greaterthanorequal part is not as straightforward as the other ones.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for review. I'll add comments here.

@dongjoon-hyun
Copy link
Member Author

Hi, @rxin . Sorry for late response.
I added comments and changed the testcase name according to your comments.
Thank you for review!

@SparkQA
Copy link

SparkQA commented Apr 12, 2016

Test build #55618 has finished for PR 12312 at commit 16ab6c8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

Hi, @rxin .
For LikeSimplification, is there something to do more?

EndsWith(l, Literal(pattern))
// 'a%a' pattern is basically same with 'a%' && '%a'.
// However, the additional `Length` condition is required to prevent 'a' match 'a%a'.
case startsAndEndsWith(prefix, postfix) if !prefix.endsWith("\\") =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while you are at it, can you rename "pattern" in the startsWith case to prefix, and endsWith to suffix?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also rename utf to "pattern"

and just compute the value of GreaterThanOrEqual(Length(l), Literal(prefix.size + postfix.size)) directly since it is a literal

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I will renames those parameters.
By the way, l of Length(l) is not literal here. It could be column.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for spending time here. I know your are very busy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok - let's leave the l one there

@dongjoon-hyun
Copy link
Member Author

I updated to use pattern, prefix, postfix, and infix (as a similar manner).

Contains(l, Literal(pattern))
case equalTo(pattern) =>
EqualTo(l, Literal(pattern))
case Like(l, Literal(pattern, StringType)) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry to be pedantic. do you mind changing l to something like input? "l" is too short and unclear what it means, and it also just looks like one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're welcome. :) I'll fix right now!

@SparkQA
Copy link

SparkQA commented Apr 14, 2016

Test build #55826 has finished for PR 12312 at commit 5753437.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 14, 2016

Test build #55830 has finished for PR 12312 at commit 111a78c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Apr 14, 2016

Thanks - merging in master.

@asfgit asfgit closed this in d7e124e Apr 14, 2016
@dongjoon-hyun
Copy link
Member Author

Thank you for merging, @rxin !

@dongjoon-hyun dongjoon-hyun deleted the SPARK-14545 branch May 12, 2016 00:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants