-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-30245][SQL][FOLLOWUP] Improve regex expression when pattern not changed #27497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #118058 has finished for PR 27497 at commit
|
|
Test build #118065 has finished for PR 27497 at commit
|
|
If you're interested in this, could you answer the @HyukjinKwon comment? #26875 (comment) |
Oh, yeah. I see that. We need more investigation. |
| Pattern.compile(escape(str)) | ||
| } | ||
|
|
||
| var lastPatternStr: String = null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can keep lastPattern as Pattern, and compare it to the current pattern via lastPattern.pattern() == patternStr. No need to keep lastPattern as a string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review.
Of course,I can do like this. I just followed the way of RegExpExtract and RegExpReplace here.
| val regex = if (cache == null) compile(s) else cache | ||
| val patternStr = input2.asInstanceOf[UTF8String].toString | ||
| val regex = if (cache == null) { | ||
| if (!(patternStr).equals(lastPatternStr)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any reasons to not use patternStr != lastPatternStr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, I see this is a copy-paste from Java codegen
|
I agree with @maropu and @HyukjinKwon 's opinion. Without resolving their concerns, we should not merge this because this might not be the final follow-up. |
|
@dongjoon-hyun @HyukjinKwon Yes. maybe we need more performance investigation, but I find spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala Line 382 in e1cd4d9
spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala Line 477 in e1cd4d9
I think the pattern string or regex string not too long. |
|
I will close this and the jira because of #26875 (comment) |
|
@maropu Thanks! |
|
Thank you so much for investigations @beliefer. |
|
@HyukjinKwon With pleasure do it. |
What changes were proposed in this pull request?
This PR follows up #26875.
Why are the changes needed?
When pattern is not static, we should avoid compile pattern every time if some pattern is same.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Exists UT.