-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-30189][SQL] Interval from year-month/date-time string should handle whitespaces #26815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @cloud-fan @maropu, thanks in advance. |
|
also cc: @MaxGekk |
|
Just in case, have you checked the changes of performance numbers in |
|
@yaooqinn Maybe I have missed something but where does the requirement of supporting |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala
Outdated
Show resolved
Hide resolved
| } | ||
|
|
||
| private val yearMonthPattern = "^([+|-])?(\\d+)-(\\d+)$".r | ||
| private val yearMonthPattern = "^([+|-])?(\\s+)?(\\d+)-(\\d+)$".r |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just \\s*?
|
Before patching the buggy method, can we wait till this #26473 be merged? |
AFAIK, |
We have support dealing with ISO whitespaces for almost all internal types |
|
Test build #115035 has finished for PR 26815 at commit
|
|
is this still valid? |
|
I will check this fix as ASAP, thanks for reminding me. |
|
|
||
| // whitespaces | ||
| check("\t +5 12:40\t ", DAY, MINUTE, "5 days 12 hours 40 minutes") | ||
| checkFail("+5\t 12:40", DAY, MINUTE, "must match day-time format") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for multi-unit syntax, do we support 1 day \t 2 hours?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. My opinion is that the interval string in multi-unit syntax, we parse each value and each unit separately and they can be produced by the providers separately too so the whitespaces can be in the middle of each part. In year-month and day-time syntax, the string should be considered as a whole part so the whitespaces only appear on both ends, not in the middle.
take java.sql.Timestamp for an example, the timestamp string is treated as a whole
scala> java.sql.Timestamp.valueOf("2011-11-11 11:11:11.11\t")
res8: java.sql.Timestamp = 2011-11-11 11:11:11.11
scala> java.sql.Timestamp.valueOf("2011-11-11 \t 11:11:11.11\t")
java.lang.NumberFormatException: For input string: " 11"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:569)
at java.lang.Integer.parseInt(Integer.java:615)
at java.sql.Timestamp.valueOf(Timestamp.java:243)
... 28 elided
Also, we do the same in Spark
spark-sql> select timestamp '2013-01-02 \t 00:00:00';
Error in query:
Cannot parse the TIMESTAMP value: 2013-01-02 00:00:00(line 1, pos 7)
== SQL ==
select timestamp '2013-01-02 \t 00:00:00'
-------^^^
spark-sql> select timestamp '2013-01-02 00:00:00 \t';
2013-01-02 00:00:00
| org.apache.spark.sql.catalyst.parser.ParseException | ||
|
|
||
| requirement failed: Interval string must match day-time format of '^(?<sign>[+|-])?(?<day>\d+) (?<hour>\d{1,2}):(?<minute>\d{1,2}):(?<second>(\d{1,2})(\.(\d{1,9}))?)$': | ||
| - 10 12:34:46.789 , set spark.sql.legacy.fromDayTimeString.enabled to true to restore the behavior before Spark 3.0.(line 1, pos 16) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not related to this PR, but we should only ask users to set the legacy config if it works.
| @@ -1,5 +1,5 @@ | |||
| -- Automatically generated by SQLQueryTestSuite | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not related to this PR: we should put the test queries that have different behaviors if ansi mode is on to a different file, and only test that file with ansi mode on and off.
|
Test build #119612 has finished for PR 26815 at commit
|
|
thanks, merging to master/3.0! |
…andle whitespaces ### What changes were proposed in this pull request? Currently, we parse interval from multi units strings or from date-time/year-month pattern strings, the former handles all whitespace, the latter not or even spaces. ### Why are the changes needed? behavior consistency ### Does this PR introduce any user-facing change? yes, interval in date-time/year-month like ``` select interval '\n-\t10\t 12:34:46.789\t' day to second -- !query 126 schema struct<INTERVAL '-10 days -12 hours -34 minutes -46.789 seconds':interval> -- !query 126 output -10 days -12 hours -34 minutes -46.789 seconds ``` is valid now. ### How was this patch tested? add ut. Closes #26815 from yaooqinn/SPARK-30189. Authored-by: Kent Yao <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 3bd6ebf) Signed-off-by: Wenchen Fan <[email protected]>
…andle whitespaces ### What changes were proposed in this pull request? Currently, we parse interval from multi units strings or from date-time/year-month pattern strings, the former handles all whitespace, the latter not or even spaces. ### Why are the changes needed? behavior consistency ### Does this PR introduce any user-facing change? yes, interval in date-time/year-month like ``` select interval '\n-\t10\t 12:34:46.789\t' day to second -- !query 126 schema struct<INTERVAL '-10 days -12 hours -34 minutes -46.789 seconds':interval> -- !query 126 output -10 days -12 hours -34 minutes -46.789 seconds ``` is valid now. ### How was this patch tested? add ut. Closes apache#26815 from yaooqinn/SPARK-30189. Authored-by: Kent Yao <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
Currently, we parse interval from multi units strings or from date-time/year-month pattern strings, the former handles all whitespace, the latter not or even spaces.
Why are the changes needed?
behavior consistency
Does this PR introduce any user-facing change?
yes, interval in date-time/year-month like
is valid now.
How was this patch tested?
add ut.