Skip to content

Conversation

@yaooqinn
Copy link
Member

@yaooqinn yaooqinn commented Sep 10, 2020

What changes were proposed in this pull request?

In this PR, we add a checker for STRING form interval value ahead for parsing multiple units intervals and fail directly if the interval value contains alphabets to prevent correctness issues like interval '1 day 2' day=3 days.

Why are the changes needed?

fix correctness issue

Does this PR introduce any user-facing change?

yes, in spark 3.0.0 interval '1 day 2' day=3 days but now we fail with ParseException

How was this patch tested?

add a test.

@yaooqinn
Copy link
Member Author

cc @cloud-fan @maropu @HyukjinKwon and thanks very much for the review.

}
}

private final val alphabet = "[a-zA-Z]".r.pattern
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of checking the invalid ones, how about we make sure the value only has digits and dots?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK updated.

val v = if (values(i).STRING() != null) {
string(values(i).STRING())
val value = string(values(i).STRING()).trim
if (!intervalValStrPattern.matcher(value).matches()) {
Copy link
Contributor

@cloud-fan cloud-fan Sep 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about !value.forall { c => c == '.' || c == "+" || c == "-" || Character.isDigit(c)}? We don't have to use regex.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok maybe the first version is better, as we basically just want to avoid the value string to contain unit. value.exists(Character.isLetter) should be good enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we should add comments to explain why we need the check.

@cloud-fan
Copy link
Contributor

github action passed, I'm merging it to master, thanks!

@cloud-fan cloud-fan closed this in 5669b21 Sep 10, 2020
@cloud-fan
Copy link
Contributor

@yaooqinn can you open a new PR for 3.0?

@SparkQA
Copy link

SparkQA commented Sep 10, 2020

Test build #128501 has finished for PR 29708 at commit 7e98555.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 10, 2020

Test build #128497 has finished for PR 29708 at commit 480fe6d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 10, 2020

Test build #128499 has finished for PR 29708 at commit aa9e9bf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 10, 2020

Test build #128500 has finished for PR 29708 at commit a9294a1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

maropu pushed a commit that referenced this pull request Sep 11, 2020
…adhesive with the unit

THIS PR backports #29708 to 3.0

### What changes were proposed in this pull request?
In this PR, we add a checker for STRING form interval value ahead for parsing multiple units intervals and fail directly if the interval value contains alphabets to prevent correctness issues like `interval '1 day 2' day`=`3 days`.

### Why are the changes needed?

fix correctness issue

### Does this PR introduce _any_ user-facing change?

yes, in spark 3.0.0 `interval '1 day 2' day`=`3 days` but now we fail with ParseException
### How was this patch tested?

add a test.

Closes #29716 from yaooqinn/SPARK-32840-30.

Authored-by: Kent Yao <[email protected]>
Signed-off-by: Takeshi Yamamuro <[email protected]>
holdenk pushed a commit to holdenk/spark that referenced this pull request Oct 27, 2020
…adhesive with the unit

THIS PR backports apache#29708 to 3.0

### What changes were proposed in this pull request?
In this PR, we add a checker for STRING form interval value ahead for parsing multiple units intervals and fail directly if the interval value contains alphabets to prevent correctness issues like `interval '1 day 2' day`=`3 days`.

### Why are the changes needed?

fix correctness issue

### Does this PR introduce _any_ user-facing change?

yes, in spark 3.0.0 `interval '1 day 2' day`=`3 days` but now we fail with ParseException
### How was this patch tested?

add a test.

Closes apache#29716 from yaooqinn/SPARK-32840-30.

Authored-by: Kent Yao <[email protected]>
Signed-off-by: Takeshi Yamamuro <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants