-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-30759][SQL] Initialize cache for foldable patterns in StringRegexExpression #27502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| // try cache the pattern for Literal | ||
| // try cache foldable pattern | ||
| private lazy val cache: Pattern = pattern match { | ||
| case Literal(value: String, StringType) => compile(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reynold just moved the code. Actually it exists since 1.0.0: af3746c#diff-d788f93e29b4d25cdd7d60328587678bR42
| private lazy val cache: Pattern = pattern match { | ||
| case Literal(value: String, StringType) => compile(value) | ||
| case _ => null | ||
| case p: Expression if p.foldable => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will consider 'a' + 'b' from now. I prefer to consider SPARK-30759 as a performance improvement and merge to master only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes don't impact on behavior in any case, so, they can be considered only as an optimization.
|
cc @rxin, @gatorsmile , @cloud-fan . @MaxGekk . Although the above is my first impression, I'll not be against backporting this. |
|
Test build #118075 has finished for PR 27502 at commit
|
|
@HyukjinKwon FYI, since you are in #26875 & #27514 |
HyukjinKwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Yes, my impression is that we don't need to back port.
|
Test build #118126 has finished for PR 27502 at commit
|
|
retest this please |
|
Test build #118139 has finished for PR 27502 at commit
|
|
Test build #118135 has finished for PR 27502 at commit
|
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @MaxGekk and @HyukjinKwon .
+1, LGTM. Merged to master for 3.1.0.
|
Good catch! I'm surprised that this bug is exposed after so many years... Shall we add a test for this bug? |
|
+1 for @cloud-fan 's suggestion. |
|
@dongjoon-hyun @cloud-fan Here is the test #27547 |
…sion ### What changes were proposed in this pull request? In the PR, I propose to fix `cache` initialization in `StringRegexExpression` by changing of expected value type in `case Literal(value: String, StringType)` from `String` to `UTF8String`. This is a backport of #27502 and #27547 ### Why are the changes needed? Actually, the case doesn't work at all because `Literal`'s value has type `UTF8String`, see <img width="649" alt="Screen Shot 2020-02-08 at 22 45 50" src="https://user-images.githubusercontent.com/1580697/74091681-0d4a2180-4acb-11ea-8a0d-7e8c65f4214e.png"> ### Does this PR introduce any user-facing change? No ### How was this patch tested? Added new test by `RegexpExpressionsSuite`. Closes #27713 from MaxGekk/str-regexp-foldable-pattern-backport. Authored-by: Maxim Gekk <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…sion In the PR, I propose to fix `cache` initialization in `StringRegexExpression` by changing of expected value type in `case Literal(value: String, StringType)` from `String` to `UTF8String`. This is a backport of #27502 and #27547 Actually, the case doesn't work at all because `Literal`'s value has type `UTF8String`, see <img width="649" alt="Screen Shot 2020-02-08 at 22 45 50" src="https://user-images.githubusercontent.com/1580697/74091681-0d4a2180-4acb-11ea-8a0d-7e8c65f4214e.png"> No Added new test by `RegexpExpressionsSuite`. Closes #27713 from MaxGekk/str-regexp-foldable-pattern-backport. Authored-by: Maxim Gekk <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit cfc48a8) Signed-off-by: Dongjoon Hyun <[email protected]>
…gexExpression ### What changes were proposed in this pull request? In the PR, I propose to fix `cache` initialization in `StringRegexExpression` by changing `case Literal(value: String, StringType)` to `case p: Expression if p.foldable` ### Why are the changes needed? Actually, the case doesn't work at all because of: 1. Literals value has type `UTF8String` 2. It doesn't work for foldable expressions like in the example: ```sql SELECT '%SystemDrive%\Users\John' _FUNC_ '%SystemDrive%\\Users.*'; ``` <img width="649" alt="Screen Shot 2020-02-08 at 22 45 50" src="https://user-images.githubusercontent.com/1580697/74091681-0d4a2180-4acb-11ea-8a0d-7e8c65f4214e.png"> ### Does this PR introduce any user-facing change? No ### How was this patch tested? By the `check outputs of expression examples` test from `SQLQuerySuite`. Closes apache#27502 from MaxGekk/str-regexp-foldable-pattern. Authored-by: Maxim Gekk <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
In the PR, I propose to fix
cacheinitialization inStringRegexExpressionby changingcase Literal(value: String, StringType)tocase p: Expression if p.foldableWhy are the changes needed?
Actually, the case doesn't work at all because of:
UTF8StringDoes this PR introduce any user-facing change?
No
How was this patch tested?
By the
check outputs of expression examplestest fromSQLQuerySuite.