Skip to content

Conversation

@beliefer
Copy link
Contributor

@beliefer beliefer commented Feb 8, 2020

What changes were proposed in this pull request?

This PR follows up #26875.

Why are the changes needed?

When pattern is not static, we should avoid compile pattern every time if some pattern is same.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Exists UT.

@SparkQA
Copy link

SparkQA commented Feb 8, 2020

Test build #118058 has finished for PR 27497 at commit b43a871.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 8, 2020

Test build #118065 has finished for PR 27497 at commit 6a8fda8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@beliefer
Copy link
Contributor Author

beliefer commented Feb 8, 2020

cc @maropu @dongjoon-hyun @cloud-fan

@maropu
Copy link
Member

maropu commented Feb 8, 2020

If you're interested in this, could you answer the @HyukjinKwon comment? #26875 (comment)
I think we should check that performance first.

@beliefer
Copy link
Contributor Author

beliefer commented Feb 8, 2020

If you're interested in this, could you answer the @HyukjinKwon comment? #26875 (comment)
I think we should check that performance first.

Oh, yeah. I see that. We need more investigation.

Pattern.compile(escape(str))
}

var lastPatternStr: String = null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can keep lastPattern as Pattern, and compare it to the current pattern via lastPattern.pattern() == patternStr. No need to keep lastPattern as a string.

Copy link
Contributor Author

@beliefer beliefer Feb 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review.
Of course,I can do like this. I just followed the way of RegExpExtract and RegExpReplace here.

val regex = if (cache == null) compile(s) else cache
val patternStr = input2.asInstanceOf[UTF8String].toString
val regex = if (cache == null) {
if (!(patternStr).equals(lastPatternStr)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any reasons to not use patternStr != lastPatternStr?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I see this is a copy-paste from Java codegen

@dongjoon-hyun
Copy link
Member

I agree with @maropu and @HyukjinKwon 's opinion. Without resolving their concerns, we should not merge this because this might not be the final follow-up.

@beliefer
Copy link
Contributor Author

beliefer commented Feb 8, 2020

@dongjoon-hyun @HyukjinKwon Yes. maybe we need more performance investigation, but I find RegExpExtract and RegExpReplace also use this method.

override def nullSafeEval(s: Any, p: Any, r: Any): Any = {

override def nullSafeEval(s: Any, p: Any, r: Any): Any = {

I think the pattern string or regex string not too long.

@maropu
Copy link
Member

maropu commented Mar 22, 2020

I will close this and the jira because of #26875 (comment)

@maropu maropu closed this Mar 22, 2020
@beliefer
Copy link
Contributor Author

@maropu Thanks!

@HyukjinKwon
Copy link
Member

Thank you so much for investigations @beliefer.

@beliefer
Copy link
Contributor Author

@HyukjinKwon With pleasure do it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants