Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Feb 8, 2020

What changes were proposed in this pull request?

In the PR, I propose to fix cache initialization in StringRegexExpression by changing case Literal(value: String, StringType) to case p: Expression if p.foldable

Why are the changes needed?

Actually, the case doesn't work at all because of:

  1. Literals value has type UTF8String
  2. It doesn't work for foldable expressions like in the example:
SELECT '%SystemDrive%\Users\John' _FUNC_ '%SystemDrive%\\Users.*';

Screen Shot 2020-02-08 at 22 45 50

Does this PR introduce any user-facing change?

No

How was this patch tested?

By the check outputs of expression examples test from SQLQuerySuite.

// try cache the pattern for Literal
// try cache foldable pattern
private lazy val cache: Pattern = pattern match {
case Literal(value: String, StringType) => compile(value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reynold just moved the code. Actually it exists since 1.0.0: af3746c#diff-d788f93e29b4d25cdd7d60328587678bR42

private lazy val cache: Pattern = pattern match {
case Literal(value: String, StringType) => compile(value)
case _ => null
case p: Expression if p.foldable =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will consider 'a' + 'b' from now. I prefer to consider SPARK-30759 as a performance improvement and merge to master only.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes don't impact on behavior in any case, so, they can be considered only as an optimization.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Feb 8, 2020

cc @rxin, @gatorsmile , @cloud-fan .

@MaxGekk . Although the above is my first impression, I'll not be against backporting this.

@SparkQA
Copy link

SparkQA commented Feb 9, 2020

Test build #118075 has finished for PR 27502 at commit 0986e5c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Feb 10, 2020

@HyukjinKwon FYI, since you are in #26875 & #27514

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Yes, my impression is that we don't need to back port.

@SparkQA
Copy link

SparkQA commented Feb 10, 2020

Test build #118126 has finished for PR 27502 at commit 7bc5d9e.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Feb 10, 2020

Test build #118139 has finished for PR 27502 at commit 7bc5d9e.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 10, 2020

Test build #118135 has finished for PR 27502 at commit 7bc5d9e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @MaxGekk and @HyukjinKwon .
+1, LGTM. Merged to master for 3.1.0.

@cloud-fan
Copy link
Contributor

Good catch! I'm surprised that this bug is exposed after so many years...

Shall we add a test for this bug?

@dongjoon-hyun
Copy link
Member

+1 for @cloud-fan 's suggestion.

@MaxGekk
Copy link
Member Author

MaxGekk commented Feb 12, 2020

@dongjoon-hyun @cloud-fan Here is the test #27547

dongjoon-hyun pushed a commit that referenced this pull request Feb 26, 2020
…sion

### What changes were proposed in this pull request?
In the PR, I propose to fix `cache` initialization in `StringRegexExpression` by changing of expected value type in `case Literal(value: String, StringType)` from `String` to `UTF8String`.

This is a backport of #27502 and #27547

### Why are the changes needed?
Actually, the case doesn't work at all because `Literal`'s value has type `UTF8String`, see
<img width="649" alt="Screen Shot 2020-02-08 at 22 45 50" src="https://user-images.githubusercontent.com/1580697/74091681-0d4a2180-4acb-11ea-8a0d-7e8c65f4214e.png">

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Added new test by `RegexpExpressionsSuite`.

Closes #27713 from MaxGekk/str-regexp-foldable-pattern-backport.

Authored-by: Maxim Gekk <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
dongjoon-hyun pushed a commit that referenced this pull request Feb 26, 2020
…sion

In the PR, I propose to fix `cache` initialization in `StringRegexExpression` by changing of expected value type in `case Literal(value: String, StringType)` from `String` to `UTF8String`.

This is a backport of #27502 and #27547

Actually, the case doesn't work at all because `Literal`'s value has type `UTF8String`, see
<img width="649" alt="Screen Shot 2020-02-08 at 22 45 50" src="https://user-images.githubusercontent.com/1580697/74091681-0d4a2180-4acb-11ea-8a0d-7e8c65f4214e.png">

No

Added new test by `RegexpExpressionsSuite`.

Closes #27713 from MaxGekk/str-regexp-foldable-pattern-backport.

Authored-by: Maxim Gekk <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit cfc48a8)
Signed-off-by: Dongjoon Hyun <[email protected]>
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
…gexExpression

### What changes were proposed in this pull request?
In the PR, I propose to fix `cache` initialization in `StringRegexExpression` by changing `case Literal(value: String, StringType)` to `case p: Expression if p.foldable`

### Why are the changes needed?
Actually, the case doesn't work at all because of:
1. Literals value has type `UTF8String`
2. It doesn't work for foldable expressions like in the example:
```sql
SELECT '%SystemDrive%\Users\John' _FUNC_ '%SystemDrive%\\Users.*';
```
<img width="649" alt="Screen Shot 2020-02-08 at 22 45 50" src="https://user-images.githubusercontent.com/1580697/74091681-0d4a2180-4acb-11ea-8a0d-7e8c65f4214e.png">

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
By the `check outputs of expression examples` test from `SQLQuerySuite`.

Closes apache#27502 from MaxGekk/str-regexp-foldable-pattern.

Authored-by: Maxim Gekk <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
@MaxGekk MaxGekk deleted the str-regexp-foldable-pattern branch June 5, 2020 19:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants