-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-29926][SQL] Fix weird interval string whose value is only a dangling decimal point #26573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…gling decial point
|
cc @cloud-fan @MaxGekk @maropu @HyukjinKwon thanks for reviewing in advance. Pre-discussion could be found here #26491 (comment) |
| var days: Int = 0 | ||
| var microseconds: Long = 0 | ||
| var fractionScale: Int = 0 | ||
| val validOriginFractionScale = (NANOS_PER_SECOND / 10).toInt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about just initialFractionScale?
|
Test build #114008 has finished for PR 26573 at commit
|
| fraction += (b - '0') * fractionScale | ||
| fractionScale /= 10 | ||
| case ' ' => | ||
| case ' ' if fractionScale != initialFractionScale => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit: seems fractionScale < initialFractionScale is safer?
|
Test build #114020 has finished for PR 26573 at commit
|
|
Test build #114024 has finished for PR 26573 at commit
|
|
|
||
| test("string to interval: seconds with fractional part") { | ||
| checkFromString("0.1 seconds", new CalendarInterval(0, 0, 100000)) | ||
| checkFromString("1. seconds", new CalendarInterval(0, 0, 1000000)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will we explicitly disable this case?
scala> sql("select 0.")
res0: org.apache.spark.sql.DataFrame = [0: decimal(1,0)]At least I know Python, Java, Scala and R support this way as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
presto> select interval '1.' second;
_col0
----------------
0 00:00:01.000
(1 row)
Query 20191119_051438_00001_f5kcs, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
0:03 [0 rows, 0B] [0 rows/s, 0B/s]
presto> select interval '.' second;
Query 20191119_051452_00002_f5kcs failed: Invalid INTERVAL SECOND value: .Also check with presto, '1.' seconds is valid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HyukjinKwon good point. To be consistent with Spark's own parser, maybe we should allow interval '1.' second. But interval '.' second should be invalid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ +1!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated, please recheck, thanks.
| var days: Int = 0 | ||
| var microseconds: Long = 0 | ||
| var fractionScale: Int = 0 | ||
| val initialFractionScale = (NANOS_PER_SECOND / 10).toInt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we still need it now?
| fraction += (b - '0') * fractionScale | ||
| fractionScale /= 10 | ||
| case ' ' => | ||
| case ' ' if !pointPrefixed || fractionScale < initialFractionScale => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since 1. is allowed, I don't think fractionScale < initialFractionScale is a valid check any more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need this, the cases are (1.0, 0.1, .1, 1.) + (.), this condition is equal to !(pointPrefixed && fractionScale == initialFractionScale)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rm fractionScale < initialFractionScale forbiding .1 case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah i see!
|
Test build #114076 has finished for PR 26573 at commit
|
|
thanks, merging to master! |
What changes were proposed in this pull request?
Currently, we support to parse '1. second' to 1s or even '. second' to 0s.
We fix this by fixing the new interval parser's VALUE_FRACTIONAL_PART state
With further digging, we found that 1. is valid in python, r, scala, and presto and so on... so this PR
ONLY forbid the invalid interval value in the form of '. seconds'.
Why are the changes needed?
bug fix
Does this PR introduce any user-facing change?
yes, now we treat '. second' .... as invalid intervals
How was this patch tested?
add ut